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Preface 


This book is based on a two-semester sequence of courses taught to incoming 
graduate students at the University of Illinois at Urbana-Champaign, pri- 
marily physics students but also some from other branches of the physical 
sciences. The courses aim to introduce students to some of the mathematical 
methods and concepts that they will find useful in their research. We have 
sought to enliven the material by integrating the mathematics with its appli- 
cations. We therefore provide illustrative examples and problems drawn from 
physics. Some of these illustrations are classical but many are small parts of 
contemporary research papers. In the text and at the end of each chapter we 
provide a collection of exercises and problems suitable for homework assign- 
ments. The former are straightforward applications of material presented 
in the text; the latter are intended to be interesting, and take rather more 
thought and time. 


We devote the first, and longest, part (Chapters 1 to 9, and the first 
semester in the classroom) to traditional mathematical methods. We explore 
the analogy between linear operators acting on function spaces and matrices 
acting on finite dimensional spaces, and use the operator language to pro- 
vide a unified framework for working with ordinary differential equations, 
partial differential equations, and integral equations. The mathematical pre- 
requisites are a sound grasp of undergraduate calculus (including the vector 
calculus needed for electricity and magnetism courses), elementary linear al- 
gebra, and competence at complex arithmetic. Fourier sums and integrals, as 
well as basic ordinary differential equation theory, receive a quick review, but 
it would help if the reader had some prior experience to build on. Contour 
integration is not required for this part of the book. 


The second part (Chapters 10 to 14) focuses on modern differential ge- 
ometry and topology, with an eye to its application to physics. The tools of 
calculus on manifolds, especially the exterior calculus, are introduced, and 
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used to investigate classical mechanics, electromagnetism, and non-abelian 
gauge fields. The language of homology and cohomology is introduced and 
is used to investigate the influence of the global topology of a manifold on 
the fields that live in it and on the solutions of differential equations that 
constrain these fields. 

Chapters 15 and 16 introduce the theory of group representations and 
their applications to quantum mechanics. Both finite groups and Lie groups 
are explored. 

The last part (Chapters 17 to 19) explores the theory of complex variables 
and its applications. Although much of the material is standard, we make use 
of the exterior calculus, and discuss rather more of the topological aspects of 
analytic functions than is customary. 

A cursory reading of the Contents of the book will show that there is 
more material here than can be comfortably covered in two semesters. When 
using the book as the basis for lectures in the classroom, we have found it 
useful to tailor the presented material to the interests of our students. 
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Chapter 1 


Calculus of Variations 


We begin our tour of useful mathematics with what is called the calculus of 
variations. Many physics problems can be formulated in the language of this 
calculus, and once they are there are useful tools to hand. In the text and 
associated exercises we will meet some of the equations whose solution will 
occupy us for much of our journey. 


1.1 What is it good for? 


The classical problems that motivated the creators of the calculus of varia- 
tions include: 

i) Dido’s problem: In Virgil’s Aeneid we read how Queen Dido of Carthage 
must find largest area that can be enclosed by a curve (a strip of bull’s 
hide) of fixed length. 

ii) Plateau’s problem: Find the surface of minimum area for a given set of 
bounding curves. A soap film on a wire frame will adopt this minimal- 
area configuration. 

iii) Johann Bernoulli’s Brachistochrone: A bead slides down a curve with 
fixed ends. Assuming that the total energy smu" + V(z) is constant, 
find the curve that gives the most rapid descent. 

iv) Catenary: Find the form of a hanging heavy chain of fixed length by 
minimizing its potential energy. 

These problems all involve finding maxima or minima, and hence equating 
some sort of derivative to zero. In the next section we define this derivative, 
and show how to compute it. 
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1.2 Functionals 


In variational problems we are provided with an expression J[y] that “eats” 
whole functions y(az) and returns a single number. Such objects are called 
functionals to distinguish them from ordinary functions. An ordinary func- 
tion is a map f:R—R. A functional J is a map J: C™(R) — R where 
C™(R) is the space of smooth (having derivatives of all orders) functions. 
To find the function y(x) that maximizes or minimizes a given functional 
J|y| we need to define, and evaluate, its functional derivative. 


1.2.1 The functional derivative 


We restrict ourselves to expressions of the form 
ms / " 
X1 


where f depends on the value of y(x) and only finitely many of its derivatives. 
Such functionals are said to be local in x. 

Consider first a functional J = [ fdx in which f depends only x, y and 
y’. Make a change y(x) — y(x) + n(x), where ¢ is a (small) x-independent 
constant. The resultant change in J is 


v2 


Jly+en] —Jly] = {f(z,yten,y’ +en') — f(z,y,y')} dx 
ve O dn O 


- fl Liew hf () avo 


1 
If n(a1) = n(x) = 0, the variation dy(x) = en(x) in y(z) is said to have 


“fixed endpoints.” For such variations the integrated-out part [...]?? van- 
ishes. Defining dJ to be the O(e) part of J[y + en] — Jly], we have 


[ove (a5 ~ a (oy) 


of ) de. (1.2) 


oJ 


| 
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The function 


by(x) Oy dx 
is called the functional (or Fréchet) derivative of J with respect to y(x). We 
can think of it as a generalization of the partial derivative 0J/Oy;, where the 
discrete subscript “i” on y is replaced by a continuous label “x,” and sums 
over 7 are replaced by integrals over 2: 


OJ a oJ 


éJ of d (“) i 


1.2.2 The Euler-Lagrange equation 


Suppose that we have a differentiable function J(y1, y2,..-,Yn) of n variables 
and seek its stationary points — these being the locations at which J has its 
maxima, minima and saddlepoints. At a stationary point (y1, Yyo,.--, Yn) the 
variation 


“OJ 
b=) Bye (1.5) 
a=1 


must be zero for all possible dy;. The necessary and sufficient condition for 
this is that all partial derivatives 0J/Oy;, i = 1,...,n be zero. By analogy, 
we expect that a functional Jy] will be stationary under fixed-endpoint vari- 
ations y(x) > y(x)+dy(x), when the functional derivative 6J/déy(x) vanishes 
for all x. In other words, when 


of (oh 


3] =e. i ea. (1.6) 


dy(x) dx \ dy'(a) 


The condition (1.6) for y(x) to be a stationary point is usually called the 
Euler-Lagrange equation. 

That 6J/dy(x) = 0 is a sufficient condition for d6J to be zero is clear 
from its definition in (1.2). To see that it is a necessary condition we must 
appeal to the assumed smoothness of y(x). Consider a function y(x) at which 
Jy] is stationary but where 6J/dy(x) is non-zero at some a € [2,29]. 
Because f(y,y’,x) is smooth, the functional derivative 6J/dy(x) is also a 
smooth function of x. Therefore, by continuity, it will have the same sign 
throughout some open interval containing zo. By taking dy(x) = en(x) to be 
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Figure 1.1: Soap film between two rings. 


zero outside this interval, and of one sign within it, we obtain a non-zero 6J 
— in contradiction to stationarity. In making this argument, we see why it 
was essential to integrate by parts so as to take the derivative off dy: when 
y is fixed at the endpoints, we have [ dy’ dx = 0, and so we cannot find a dy’ 
that is zero everywhere outside an interval and of one sign within it. 

When the functional depends on more than one function y, then station- 
arity under all possible variations requires one equation 


éJ__ af ad (af\ _, 
Syi(v) Oy, dx \dyi) 


(1.7) 


for each function y;(z). 
If the function f depends on higher derivatives, y”, y®), etc., then we 
have to integrate by parts more times, and we end up with 


gaat (1) + (55)-alee)t (1.8) 


1.2.3. Some applications 


Now we use our new functional derivative to address some of the classic 
problems mentioned in the introduction. 

Example: Soap film supported by a pair of coaxial rings (figure 1.1) This 
a simple case of Plateau’s problem. The free energy of the soap film is 
equal to twice (once for each liquid-air interface) the surface tension o of the 
soap solution times the area of the film. The film can therefore minimize its 
free energy by minimizing its area, and the axial symmetry suggests that the 
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minimal surface will be a surface of revolution about the x axis. We therefore 
seek the profile y(x) that makes the area 


I= an | yV1l+y?dr (1.9) 


of the surface of revolution the least among all such surfaces bounded by 
the circles of radii y(a1) = y and y(#2) = yo. Because a minimum is a 
stationary point, we seek candidates for the minimizing profile y(«) by setting 
the functional derivative 6J/dy(x) to zero. 

We begin by forming the partial derivatives 


of A4noyy’ 


Of 
—=Anoi/l+y?, — = —_ 1.10 
ay INS a = cag ia 


and use them to write down the Euler-Lagrange equation 


d yy’ 
4 ey? | 1.11 


Performing the indicated derivative with respect to x gives 


eo yy" yy)?y" _ 9 (1.12) 
OO Yiee Vie” Oye” : 
After collecting terms, this simplifies to 


ee) 
Vity? G+)? 


The differential equation (1.13) still looks a trifle intimidating. To simplify 
further, we multiply by y’ to get 


(1.13) 


/ 


yo ey’y" 


Jity? (ty)? 
d y 
ae (een 1.14 


The solution to the minimization problem therefore reduces to solving 


Y 


vity? 


(= 


(1.15) 
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where « is an as yet undetermined integration constant. Fortunately this 
non-linear, first order, differential equation is elementary. We recast it as 


dy y? 
ce | en | 1.16 
dx kK ( ) 
and separate variables 
(1.17) 


fe-f[-R : 
(4-1 


We now make the natural substitution y = « cosht, whence 


pacan fae (1.18) 


Thus we find that « + a = kt, leading to 


a (1.19) 


y = &cosh 


We select the constants « and a to fit the endpoints y(x,) = y; and y(x2) = 
Ya. 


Figure 1.2: Hanging chain 


Example: Heavy Chain over Pulleys. We cannot yet consider the form of 
the catenary, a hanging chain of fixed length, but we can solve a simpler 
problem of a heavy flexible cable draped over a pair of pulleys located at 
x= +L, y =h, and with the excess cable resting on a horizontal surface as 
illustrated in figure 1.2. 
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ig 


y=cosht 


> t=L/k 


Figure 1.3: Intersection of y = ht/L with y = cosht. 


The potential energy of the system is 


L 
PE = So mgy = os | yV 1+ (y’)?dx + const. (1.20) 
i 


Here the constant refers to the unchanging potential energy 


h 
2% i: mgy dy = mgh? (lad) 
0 


of the vertically hanging cable. The potential energy of the cable lying on the 
horizontal surface is zero because y is zero there. Notice that the tension in 
the suspended cable is being tacitly determined by the weight of the vertical 
segments. 

The Euler-Lagrange equations coincide with those of the soap film, so 


y = Kcosh (1.22) 


(x + a) 
K 
where we have to find & and a. We have 


h = «cosh(—L+a)/k, 
= «cosh(L+a)/k, (123) 
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Xx 


> 


(a,b) 
Figure 1.4: Bead on a wire. 


soa=Oand h=x«coshL/k. Setting t = L/k this reduces to 


(=) t = cosht. (1.24) 


By considering the intersection of the line y = ht/L with y = cosht (figure 
1.3) we see that if h/L is too small there is no solution (the weight of the 
suspended cable is too big for the tension supplied by the dangling ends) 
and once h/L is large enough there will be two possible solutions. Further 
investigation will show that the solution with the larger value of & is a point 
of stable equilibrium, while the solution with the smaller « is unstable. 
Example: The Brachistochrone. This problem was posed as a challenge by 
Johann Bernoulli in 1696. He asked what shape should a wire with endpoints 
(0,0) and (a, b) take in order that a frictionless bead will slide from rest down 
the wire in the shortest possible time (figure 1.4). The problem’s name comes 
from Greek: Gpaytotos means shortest and ypovos means time. 

When presented with an ostensibly anonymous solution, Johann made his 
famous remark: “Tanquam ex unguem leonem” (I recognize the lion by his 
clawmark) meaning that he recognized that the author was Isaac Newton. 

Johann gave a solution himself, but that of his brother Jacob Bernoulli 
was superior and Johann tried to pass it off as his. This was not atypical. 
Johann later misrepresented the publication date of his book on hydraulics 
to make it seem that he had priority in this field over his own son, Daniel 
Bernoulli. 
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(0, 0) ef 


(a,b) 


Vy 


Figure 1.5: A wheel rolls on the x axis. The dot, which is fixed to the rim of 
the wheel, traces out a cycloid. 


We begin our solution of the problem by observing that the total energy 
1 1 
E = 5m(a’ +9") — mgy = 5ma°(1 + y”) — mgy, (1.25) 


of the bead is constant. From the initial condition we see that this constant 
is zero. We therefore wish to minimize 


7a ce Aan ‘| a (1.26) 


so as find y(x), given that y(0) = 0 and y(a) = b. The Euler-Lagrange 
equation is 


1 
yy" + s(t +y") = 0. (1.27) 
Again this looks intimidating, but we can use the same trick of multiplying 
through by y’ to get 
v (w" + s+ y2)) 25 {y(l+y")} =0 (1.28) 
2 2 dx 
Thus 
2c = y(1+y"). (1.29) 
This differential equation has a parametric solution 


x =c(O—sin8), 
y = c(1 — cos 9), (1.30) 
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(as you should verify) and the solution is the cycloid shown in figure 1.5. 
The parameter c is determined by requiring that the curve does in fact pass 
through the point (a, 6). 


1.2.4 First integral 


How did we know that we could simplify both the soap-film problem and 
the brachistochrone by multiplying the Euler equation by y’? The answer 
is that there is a general principle, closely related to energy conservation in 
mechanics, that tells us when and how we can make such a simplification. 
The y’ trick works when the f in f/f dz is of the form f(y,y’), i.e. has no 
explicit dependence on x. In this case the last term in 

df _ Of | Of , Of 


a as. 1.31 
dx ae ae OF eh) 


is absent. We then have 


dy yf) OE wh Ok yd (OF 
df 7 By Y Oy Y Oy 4 de Oy! 


_ (Of _ 4 (or 
= V(G;-a(a)) wes 


and this is zero if the Euler-Lagrange equation is satisfied. 
The quantity 


Il=f-y— bos 
Voy (1.33) 
is called a first integral of the Euler-Lagrange equation. In the soap-film case 
/ OF y(y')? y 


f-y 1+(y? - —==—_._ = —__—=——_.. (1.34) 


ay" THU VI+UP 


When there are a number of dependent variables y;, so that we have 


Jig ties oe hy | = J floaty. thy ah) dx (1.35) 


then the first integral becomes 


raf Dust (1.36) 


a 
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Again 


di d of 
ape = dis U- ugh 
» OF v OF _ , ad of 


a (at 4 (af 
7 Gy dx (an) vee 


a 


l| 
eg 
ya 
| 
+ 


and this zero if the Euler-Lagrange equation is satisfied for each y;. 
Note that there is only one first integral, no matter how many y;’s there 
are. 


1.3. Lagrangian mechanics 


In his Mécanique Analytique (1788) Joseph-Louis de La Grange, following 
Jean d’Alembert (1742) and Pierre de Maupertuis (1744), showed that most 
of classical mechanics can be recast as a variational condition: the principle 
of least action. The idea is to introduce the Lagrangian function L = T— V 
where T' is the kinetic energy of the system and V the potential energy, both 
expressed in terms of generalized co-ordinates q' and their time derivatives 
q’. Then, Lagrange showed, the multitude of Newton’s F = ma equations, 
one for each particle in the system, can be reduced to 


d (OL OL 
dt \ 0g Og’ 

one equation for each generalized coordinate q. Quite remarkably — given 
that Lagrange’s derivation contains no mention of maxima or minima — we 
recognise that this is precisely the condition that the action functional 


tfinal . 
sta = [ Beea'sg at (1.39) 
linitial 
be stationary with respect to variations of the trajectory q'(t) that leave the 
initial and final points fixed. This fact so impressed its discoverers that they 
believed they had uncovered the unifying principle of the universe. Mauper- 
tuis, for one, tried to base a proof of the existence of God on it. Today the 
action integral, through its starring role in the Feynman path-integral for- 
mulation of quantum mechanics, remains at the heart of theoretical physics. 
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Figure 1.6: Atwood’s machine. 


1.3.1 One degree of freedom 


We shall not attempt to derive Lagrange’s equations from d’Alembert’s ex- 
tension of the principle of virtual work — leaving this task to a mechanics 
course — but instead satisfy ourselves with some examples which illustrate 
the computational advantages of Lagrange’s approach, as well as a subtle 
pitfall. 

Consider, for example, Atwood’s Machine (figure 1.6). This device, in- 
vented in 1784 but still a familiar sight in teaching laboratories, is used to 
demonstrate Newton’s laws of motion and to measure g. It consists of two 
weights connected by a light string of length / which passes over a light and 
frictionless pulley 

The elementary approach is to write an equation of motion for each of 
the two weights 


mr, = mg—T, 
We then take into account the constraint 4; = —%» and eliminate “2 in favour 
of 1: 
mt, = mg—T, 
—MoX1 = Mog — T. (1.41) 
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Finally we eliminate the constraint force, the tension 7, and obtain the 
acceleration 


(m4 + Mg) x4 = (m4 = M2). (1.42) 
Lagrange’s solution takes the constraint into account from the very be- 


ginning by introducing a single generalized coordinate gq = 7, =1— 4%, and 
writing 


! ; 
L=T-V= 5 (ma a ma) _ (me = m1)9q- (1.43) 


From this we obtain a single equation of motion 


d (OL OL ; 
dt (5) _ agi =0 => (mi +me)g = (Mm — mo). (1.44) 


The advantage of the the Lagrangian method is that constraint forces, which 
do no net work, never appear. The disadvantage is exactly the same: if we 
need to find the constraint forces — in this case the tension in the string — 
we cannot use Lagrange alone. 

Lagrange provides a convenient way to derive the equations of motion in 
non-cartesian co-ordinate systems, such as plane polar co-ordinates. 


ve 


Figure 1.7: Polar components of acceleration. 


Consider the central force problem with fF, = —0,V(r). Newton’s method 
begins by computing the acceleration in polar coordinates. This is most 
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easily done by setting z = re’’ and differentiating twice: 


2 = (f+iroje®, 
2 = (F—ré&)e” + i(2r6+ ro)e®. (1.45) 


Reading off the components parallel and perpendicular to e”’ gives the radial 
and angular acceleration 


a = F—r6?, 
ag = rO4+276. (1.46) 


Newton’s equations therefore become 


m(# — 6") = mae 
m(r6+2°6) = 0, => © (mr°6) =0. (1.47) 


f—-— = -— 1.48 
on mrs Or ( ) 


(If this were Kepler’s problem, where V = GmM/r, we would now proceed 

to simplify this equation by substituting r = 1/u, but that is another story.) 
Following Lagrange we first compute the kinetic energy in polar coordi- 

nates (this requires less thought than computing the acceleration) and set 


a sin(i? +776?) — V(r). (1.49) 


The Euler-Lagrange equations are now 


d (OL\ AL | ee ean 
aa) oO = OG ae rs 
d (OL OL d Be 


and coincide with Newton’s. 
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The first integral is 
0) 20% 


Or raya) 
1 . 
= ue +776") +V(r). (1.51) 
which is the total energy. Thus the constancy of the first integral states that 
dE 
—=0 1.52 
ae (1.52) 


or that energy is conserved. 

Warning: We might realize, without having gone to the trouble of deriving 
it from the Lagrange equations, that rotational invariance guarantees that 
the angular momentum | = mr26 is constant. Having done so, it is almost 
irresistible to try to short-circuit some of the labour by plugging this prior 
knowledge into 


LS sn(?? + 776?) — Vir) (1.53) 


so as to eliminate the variable @ in favour of the constant 1. If we try this we 
get 


De a3 l 
L= gint + ae V(r). (1.54) 
We can now directly write down the Lagrange equation r, which is 
? OV 
mi +—, = -—. (1.55) 
mr Or 


Unfortunately this has the wrong sign before the /?/mr? term! The lesson is 
that we must be very careful in using consequences of a variational principle 
to modify the principle. It can be done, and in mechanics it leads to the 
Routhian or, in more modern language to Hamiltonian reduction, but. it 
requires using a Legendre transform. The reader should consult a book on 
mechanics for details. 


1.3.2 Noether’s theorem 


The time-independence of the first integral 


d { .OL 
jes ef a 1.56 
dt ‘a, \ °, ( ) 
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and of angular momentum 
Eon =0) (1.57) 
dt ve 


are examples of conservation laws. We obtained them both by manipulating 
the Euler-Lagrange equations of motion, but also indicated that they were 
in some way connected with symmetries. One of the chief advantages of a 
variational formulation of a physical problem is that this connection 


Symmetry <= Conservation Law 


can be made explicit by exploiting a strategy due to Emmy Noether. She 
showed how to proceed directly from the action integral to the conserved 
quantity without having to fiddle about with the individual equations of 
motion. We begin by illustrating her technique in the case of angular mo- 
mentum, whose conservation is a consequence the rotational symmetry of 
the central force problem. The action integral for the central force problem 
is 


TY 
s= | {pnt +776?) — vin} dt. (1.58) 

0 
Noether observes that the integrand is left unchanged if we make the variation 
O(t) — O(t) + ca (1.59) 


where a is a fixed angle and € is a small, time-independent, parameter. This 
invariance is the symmetry we shall exploit. It is a mathematical identity: 
it does not require that r and @ obey the equations of motion. She next 
observes that since the equations of motion are equivalent to the statement 
that S is left stationary under any infinitesimal variations in r and 6, they 
necessarily imply that S is stationary under the specific variation 


6(t) > O(t) +e(t)a (1.60) 


where now €¢ is allowed to be time-dependent. This stationarity of the action 
is no longer a mathematical identity, but, because it requires r, 0, to obey 
the equations of motion, has physical content. Inserting 60 = <(t)a into our 
expression for S gives 


6S = af { mar?6} é dt. (1.61) 
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Note that this variation depends only on the time derivative of ¢, and not ¢ 
itself. This is because of the invariance of S under time-independent rota- 
tions. We now assume that <(t) = 0 at t = 0 and t = T, and integrate by 
parts to take the time derivative off ¢ and put it on the rest of the integrand: 


6S = -a f {plenr6) bev dt. (1.62) 


Since the equations of motion say that 6S = 0 under all infinitesimal varia- 
tions, and in particular those due to any time dependent rotation ¢(t)a, we 
deduce that the equations of motion imply that the coefficient of ¢(t) must 
be zero, and so, provided r(t), 0(t), obey the equations of motion, we have 


mr?6). (1.63) 


—— 
at 


As a second illustration we derive energy (first integral) conservation for 
the case that the system is invariant under time translations — meaning 
that L does not depend explicitly on time. In this case the action integral 
is invariant under constant time shifts t — t+ in the argument of the 
dynamical variable: 

q(t) > q(t +e) = a(t) + eq. (1.64) 


The equations of motion tell us that that the action will be stationary under 
the variation 

dq(t) = e(t)d, (1.65) 
where again we now permit the parameter ¢ to depend on t. We insert this 
variation into 


T 
S= if Ldt (1.66) 
0 
and find 7 
65 = if ae + are + gé) p> dt. (1.67) 
0 Oq Og 


This expression contains undotted e’s. Because of this the change in S is not 
obviously zero when ¢ is time independent — but the absence of any explicit 
t dependence in L tells us that 


dh Oli Obes 
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As a consequence, for time independent ¢, we have 


io = [ eg} dt = e[L]f, (1.69) 


showing that the change in S' comes entirely from the endpoints of the time 
interval. These fixed endpoints explicitly break time-translation invariance, 
but in a trivial manner. For general e(t) we have 


- Gy FOU. ve 
6S = / {OF + zie} dt. (1.70) 


This equation is an identity. It does not rely on gq obeying the equation of 
motion. After an integration by parts, taking ¢(t) to be zero at t = 0,7, it 


is equivalent to 
BN ell OL 
= —<(L-——q ; 
6S ‘ E(t) a { zt} dt aia) 


Now we assume that q(t) does obey the equations of motion. The variation 
principle then says that 6S = 0 for any e(t), and we deduce that for q(t) 
satisfying the equations of motion we have 


d OL 
Be Bey eae Sey 
7h { =i} 0 (1.72) 


The general strategy that constitutes “Noether’s theorem” must now be 
obvious: we look for an invariance of the action under a symmetry trans- 
formation with a time-independent parameter. We then observe that if the 
dynamical variables obey the equations of motion, then the action principle 
tells us that the action will remain stationary under such a variation of the 
dynamical variables even after the parameter is promoted to being time de- 
pendent. The resultant variation of S' can only depend on time derivatives of 
the parameter. We integrate by parts so as to take all the time derivatives off 
it, and on to the rest of the integrand. Because the parameter is arbitrary, 
we deduce that the equations of motion tell us that that its coefficient in the 
integral must be zero. This coefficient is the time derivative of something, so 
this something is conserved. 


1.3.3 Many degrees of freedom 


The extension of the action principle to many degrees of freedom is straight- 
forward. As an example consider the small oscillations about equilibrium of 
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a system with N degrees of freedom. We parametrize the system in terms of 
deviations from the equilibrium position and expand out to quadratic order. 
We obtain a Lagrangian 


Le ‘ 5 Mid? — sViud' q i (1.73) 


j=l 


where M;; and V;; are N x N symmetric matrices encoding the inertial and 
potential energy properties of the system. Now we have one equation 


ef OLN Ob - 7 
~ dt (=) ~ Og >, (Mad + Va’) (1.74) 


j=l 


for each 7. 


1.3.4 Continuous systems 


The action principle can be extended to field theories and to continuum me- 
chanics. Here one has a continuous infinity of dynamical degrees of freedom, 
either one for each point in space and time or one for each point in the mate- 
rial, but the extension of the variational derivative to functions of more than 
one variable should possess no conceptual difficulties. 

Suppose we are given an action functional S[y] depending on a field y(x") 
and its first derivatives 


Oy 
Here x", pp = 0,1,...,d, are the coordinates of d+ 1 dimensional space-time. 


It is traditional to take x° = t and the other coordinates spacelike. Suppose 


further that 
|= f bat = fee D0. 0) a as (1.76) 


where L is the Lagrangian density, in terms of which 


= [eats, (1.77) 


and the integral is over the space coordinates. Now 


as = f {bole + Alene gaa} atte 


= [oo arya aety)fere 08 
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In going from the first line to the second, we have observed that 


5( ula) = sdo(n) (1.79) 


Sy Pied 


and used the divergence theorem, 


LL 
| (=) Cor S [ : An, dS, (1.80) 


where Q is some space-time region and OQ) its boundary, to integrate by 
parts. Here dS is the element of area on the boundary, and n,, the outward 
normal. As before, we take dy to vanish on the boundary, and hence there 
is no boundary contribution to variation of S. The result is that 


6S OL —(<45) 


dpa) do(a) dx \p, (2) 


and the equation of motion comes from setting this to zero. Note that a sum 
over the repeated coordinate index p is implied. In practice it is easier not to 
use this formula. Instead, make the variation by hand—as in the following 
examples. 

Example: The Vibrating string. The simplest continuous dynamical system 
is the transversely vibrating string. We describe the string displacement by 
y(a,t). 


(1.81) 


y(X,t) 


Figure 1.8: Transversely vibrating string 


Let us suppose that the string has fixed ends, a mass per unit length 
of p, and is under tension TT. If we assume only small displacements from 
equilibrium, the Lagrangian is 


fe i LL » 
L= | dx {30 Ty } (1.82) 
' 2 g) 
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The dot denotes a partial derivative with respect to t, and the prime a partial 
derivative with respect to x. The variation of the action is 


L 
6S // dtdx {py oy — Ty'dy’'} 
0 


i dtdx {5y(x,t) (—py + Ty")}. (1.83) 


To reach the second line we have integrated by parts, and, because the ends 
are fixed, and therefore dy = 0 at x = 0 and JL, there is no boundary term. 
Requiring that 6S = 0 for all allowed variations dy then gives the equation 
of motion 

py — Ty” =0 (1.84) 


This is the wave equation describing transverse waves propagating with speed 
c= 4/T/p. Observe that from (1.83) we can read off the functional derivative 
of S with respect to the variable y(z,t) as being 
6S 
dy (x, t) 
In writing down the first integral for this continuous system, we must 
replace the sum over discrete indices by an integral: 


E= 85, Si4 fe WO} =. (1.86) 


When computing 6L/dy(x) from 


L 
es eee 
L= | dxé =py? — =Ty’' 
[eed 50i sty}. 


we must remember that it is the continuous analogue of OL /0q;, and so, in 
contrast to what we do when computing 6S/dy(x), we must treat y(x) as a 
variable independent of y(x). We then have 


= —pij(x,t) + Ty" (z, t). (1.85) 


éL 
TT = py(2), 1.87 
sy PIE”) (1.87) 
leading to 
i +2 1 72 
R= 3 de ey ©) ln es eo (1.88) 
: 2 2 


This, as expected, is the total energy, kinetic plus potential, of the string. 
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The energy-momentum tensor 


If we consider an action of the form 
s= [ lee) aa, (1.89) 


in which £ does not depend explicitly on any of the co-ordinates x7“, we may 
refine Noether’s derivation of the law of conservation total energy and obtain 
accounting information about the position-dependent energy density. To do 
this we make a variation of the form 


p(x) > (a + e"(x)) = v(x") + e*(x)O,9 + O(lel’), (1.90) 


where ¢ depends on x = (2°,..., 2%). The resulting variation in S is 


6S = [{Seetae+ S~a(eraye)} dttly 


G .} dare, (1.91) 


I| 
(an) 
pas 
& 
Plo 

y 
i 
wB 
lon 
= 
Q@| a 
wB 


When vy satisfies the the equations of motion this 6S will be zero for arbitrary 
e#(x). We conclude that 


O or: 


The (d+ 1)-by-(d + 1) array of functions 


ie sou ae (1.93) 


is known as the canonical energy-momentum tensor because the statement 
OL, =0 (1.94) 


often provides book-keeping for the flow of energy and momentum. 
In the case of the vibrating string, the 4 = 0,1 components of 0,T”,, = 0 
become the two following local conservation equations: 


O fp. 7 2 0 od 
eee gs as Fig — ie 
oS aoe Lao yy} =0, (1.95) 
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and 


Z 2 


It is easy to verify that these are indeed consequences of the wave equation. 
They are “local” conservation laws because they are of the form 


O she O P.9 I 12 
—py + — —Yy oe =yYy == @ |e 1. 


4 
at 


where q is the local density, and J the flux, of the globally conserved quantity 
Q = f qd‘z. In the first case, the local density q is 


+divJ =0, (1.97) 


is 
Pia EP 44? 1.98 
oe (1.98) 
which is the energy density. The energy flux is given by Ty = —Tyy’, which 
is the rate that a segment of string is doing work on its neighbour to the right. 
Integrating over x, and observing that the fixed-end boundary conditions are 
such that 


: 0 Cen - 7 
a— {-Tyy'} de = [-Tyy'lg = 0, (1.99) 
0 Ox 
gives us 
ae oe T 
a ior a su" dz = 0, (1.100) 
0 


which is the global energy conservation law we obtained earlier. 

The physical interpretation of T°, = —pyy’, the locally conserved quan- 
tity appearing in (1.96) is less obvious. If this were a relativistic system, 
we would immediately identify [ T°, dx as the z-component of the energy- 
momentum 4-vector, and therefore T°, as the density of z-momentum. Now 
any real string will have some motion in the x direction, but the magni- 
tude of this motion will depend on the string’s elastic constants and other 
quantities unknown to our Lagrangian. Because of this, the T°, derived 
from L cannot be the string’s z-momentum density. Instead, it is the den- 
sity of something called pseudo-momentum. The distinction between true 
and pseudo-momentum is best appreaciated by considering the correspond- 
ing Noether symmetry. The symmetry associated with Newtonian momen- 
tum is the invariance of the action integral under an x translation of the 
entire apparatus: the string, and any wave on it. The symmetry associ- 
ated with pseudo-momentum is the invariance of the action under a shift 
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y(x) — y(a — a) of the location of the wave on the string — the string it- 
self not being translated. Newtonian momentum is conserved if the ambient 
space is translationally invariant. Pseudo-momentum is conserved only if the 
string is translationally invariant — 7.e. if p and T are position independent. 
A failure to realize that the presence of a medium (here the string) requires us 
to distinguish between these two symmetries is the origin of much confusion 
involving “wave momentum.” 


Maxwell’s equations 


Michael Faraday and and James Clerk Maxwell’s description of electromag- 
netism in terms of dynamical vector fields gave us the first modern field 
theory. D’Alembert and Maupertuis would have been delighted to discover 
that the famous equations of Maxwell’s A Treatise on Electricity and Mag- 
netism (1873) follow from an action principle. There is a slight complication 
stemming from gauge invariance but, as long as we are not interested in ex- 
hibiting the covariance of Maxwell under Lorentz transformations, we can 
sweep this under the rug by working in the axial gauge, where the scalar 
electric potential does not appear. 
We will start from Maxwell’s equations 


divB = 0, 
OB 
1E = —-— 
cur aE 
OD 
lH = J+—— 
cur + DE’ 
divD = 4, (1.101) 


and show that they can be obtained from an action principle. For convenience 
we shall use natural units in which Wp = €)9 = 1, andsoc =1and D=E 
and B = H. 

The first equation divB = 0 contains no time derivatives. It is a con- 
straint which we satisfy by introducing a vector potential A such that B =curl A. 
If we set 


OA 
c= 1.102 
AE (1.102) 
then this automatically implies Faraday’s law of induction 
B 
curl E = le (1.103) 


Ot 


1.3. LAGRANGIAN MECHANICS 25 
We now guess that the Lagrangian is 
1 
b= f és 5 {B°-B } 4a. (1.104) 


The motivation is that L looks very like T — V if we regard $E? = tA? as 
being the kinetic energy and 5B? = $(curl A)? as being the potential energy. 
The term in J represents the interaction of the fields with an external current 
source. In the axial gauge the electric charge density p does not appear in 
the Lagrangian. The corresponding action is therefore 


Tite sl 
— pra = // dx Es - 5 (curl A)’ + J- A dt. (1.105) 
Now vary A to A+ 0A, whence 
5S = i] fi dx |-A -6A — (curl A) - (curl6A) + J- 5A dt. (1.106) 


Here, we have already removed the time derivative from dA by integrating 
by parts in the time direction. Now we do the integration by parts in the 
space directions by using the identity 


div (6A x (curl A)) = (curl A) - (curldA) — 6A: (curl (curlA)) — (1.107) 


and taking dA to vanish at spatial infinity, so the surface term, which would 
come from the integral of the total divergence, is zero. We end up with 


5S = | fh Bx {5a |-A — curl (curl A) + J \ dt. (1.108) 


Demanding that the variation of S be zero thus requires 


2 
A 
“* = —curl (curl A) + J, (1.109) 
or, in terms of the physical fields, 
OE 
curlB = J+ OF (1.110) 


This is Ampére’s law, as modified by Maxwell so as to include the displace- 
ment current. 
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How do we deal with the last Maxwell equation, Gauss’ law, which asserts 
that div E = p? If p were equal to zero, this equation would hold if div A = 0, 
i.e. if A were solenoidal. In this case we might be tempted to impose the 
constraint div A = 0 on the vector potential, but doing so would undo all 
our good work, as we have been assuming that we can vary A freely. 

We notice, however, that the three Maxwell equations we already possess 
tell us that 


© (aivE — p) = div (curl B) — (<iv3 a ct (1.111) 
Now div (curl B) = 0, so the left-hand side is zero provided charge is con- 


served, i.e. provided 
ptdivJ =0. (1.112) 


We assume that this is so. Thus, if Gauss’ law holds initially, it holds eter- 
nally. We arrange for it to hold at t = 0 by imposing initial conditions on 
A. We first choose A|;-9 by requiring it to satisfy 


Blt=o = curl (A|+=0) . (1.113) 


The solution is not unique, because may we add any V@ to Al;—o, but this 
does not affect the physical E and B fields. The initial “velocities” Ali=o 
are then fixed uniquely by Alto = —E|;<o, where the initial E satisfies 
Gauss’ law. The subsequent evolution of A is then uniquely determined by 
integrating the second-order equation (1.109). 

The first integral for Maxwell is 


3 
. OL 
ey aes 
1 
= jes 5 (B+ By -3-al. (1.114) 
This will be conserved if J is time independent. If J = 0, it is the total field 


energy. 

Suppose J is neither zero nor time independent. Then, looking back at 
the derivation of the time-independence of the first integral, we see that if DL 
does depend on time, we instead have 


a (1.115) 
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In the present case we have 

OL - fi Ada, (1.116) 
so that 


-fi-ade = “ = “(Field Energy) — / {J -A+J- A} de, (1.117) 


Thus, cancelling the duplicated term and using E = —A, we find 


“ (Field Energy) = -f3 -Ed’z. (1.118) 


Now | J-(—E) d®x is the rate at which the power source driving the current 
is doing work against the field. The result is therefore physically sensible. 


Continuum mechanics 


Because the mechanics of discrete objects can be derived from an action 
principle, it seems obvious that so must the mechanics of continua. This is 
certainly true if we use the Lagrangian description where we follow the his- 
tory of each particle composing the continuous material as it moves through 
space. In fluid mechanics it is more natural to describe the motion by using 
the Eulerian description in which we focus on what is going on at a partic- 
ular point in space by introducing a velocity field v(r,t). Eulerian action 
principles can still be found, but they seem to be logically distinct from the 
Lagrangian mechanics action principle, and mostly were not discovered until 
the 20th century. 

We begin by showing that Euler’s equation for the irrotational motion 
of an inviscid compressible fluid can be obtained by applying the action 
principle to a functional 


Slé,|= f atds 05 + 50(V0)? + u(o)} - (1.119) 


where p is the mass density and the flow velocity is determined from the 
velocity potential ¢ by v = V@¢. The function u(p) is the internal energy 
density. 
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Varying S|@, | with respect to p is straightforward, and gives a time 
dependent generalization of (Daniel) Bernoulli’s equation 


Od 1% 
OF ha GK ey 1.120 
a + av? + hp) (1.120) 
Here h(p) = du/dp, is the specific enthalpy. Varying with respect to ¢ 
requires an integration by parts, based on 


div (p56 V4) = p(V5d) - (Vd) + d6 div (pV), (1.121) 


and gives the equation of mass conservation 


O 

oP + div (pv) = 0. (1.122) 
Ot 

Taking the gradient of Bernoulli’s equation, and using the fact that for po- 

tential flow the vorticity w = curlv is zero and so 0;v; = 0;v;, we find that 


ow +(v-V)v =—-VA. (1.123) 


We now introduce the pressure P, which is related to h by 


h(P) = / wy. (1.124) 


We see that pVh = VP, and so obtain Euler’s equation 


Ov 


p (F iy vy) = —VP. (1.125) 


For future reference, we observe that combining the mass-conservation equa- 
tion 
Op + O; {pu;} =0 (1.126) 


with Euler’s equation 
p(Ozv; + V;0;U;) = —0;P (1.127) 


‘The enthalpy H = U + PV per unit mass. In general u and h will be functions of 
both the density and the specific entropy. By taking u to depend only on p we are tacitly 
assuming that specific entropy is constant. This makes the resultant flow barotropic, 
meaning that the pressure is a function of the density only. 
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yields 


O; {pu;} + 0; {pujv; + 6;;P} = 0, (1.128) 
which expresses the local conservation of momentum. The quantity 
It = PUj{V; + dijP (1.129) 


is the momentum-flux tensor, and is the j-th component of the flux of the 
i-th component p; = pv; of momentum density. 


The relations h = du/dp and p = dP/dh show that P and u are related 
by a Legendre transformation: P = ph — u(p). From this, and the Bernoulli 
equation, we see that the integrand in the action (1.119) is equal to minus 
the pressure: 


0 1 
—P= ose =o (Vo)? +u(p). (1.130) 


This Eulerian formulation cannot be a “follow the particle” action prin- 
ciple in a clever disguise. The mass conservation law is only a consequence 
of the equation of motion, and is not built in from the beginning as a con- 
straint. Our variations in @ are therefore conjuring up new matter rather 
than merely moving it around. 


1.4 Variable endpoints 


We now relax our previous assumption that all boundary or surface terms 
arising from integrations by parts may be ignored. We will find that variation 
principles can be very useful for working out what boundary conditions we 
should impose on our differential equations. 


Consider the problem of building a railway across a parallel sided isthmus. 
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y(x7) 
y(X>) 


x 


Figure 1.9: Railway across isthmus. 


Suppose that the cost of construction is proportional to the length of the 
track, but the cost of sea transport being negligeable, we may locate the 
terminal seaports wherever we like. We therefore wish to minimize the length 


Ly] = [ VJ 1+ (y’)dz, (1.131) 


by allowing both the path y(#) and the endpoints y(#1) and y(a2) to vary. 
Then 


Ly-+ 39) Ly) =f (6x) ae 
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We have stationarity when both 
i) the coefficient of dy(a) in the integral, 


—@ fy 
Al los) (1.133) 


is zero. This requires that y’ =const., i.e. the track should be straight. 


1.4. VARIABLE ENDPOINTS 31 


ii) The coefficients of dy(a,) and dy(a#2) vanish. For this we need 


i= (er) (2) (1.134) 


ye afl yl) 
This in turn requires that y'(a1) = y/(x2) = 0. 

The integrated-out bits have determined the boundary conditions that are to 
be imposed on the solution of the differential equation. In the present case 
they require us to build perpendicular to the coastline, and so we go straight 
across the isthmus. When boundary conditions are obtained from endpoint 
variations in this way, they are called natural boundary conditions. 
Example: Sliding String. A massive string of linear density p is stretched 
between two smooth posts separated by distance 2L. The string is under 
tension 7’, and is free to slide up and down the posts. We consider only a 
small deviations of the string from the horizontal. 


ya 


Figure 1.10: Sliding string. 


As we saw earlier, the Lagrangian for a stretched string is 


T= id. {ger — STW} dx. (1.135) 


Now, Lagrange’s principle says that the equation of motion is found by re- 
quiring the action 


tf 
S =, Ldt (1.136) 
ty 
to be stationary under variations of y(x,t) that vanish at the initial and final 


times, t; and ty. It does not demand that dy vanish at ends of the string, 
x = +L. So, when we make the variation, we must not assume this. Taking 
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care not to discard the results of the integration by parts in the x direction, 
we find 


tf L ty 
5S = : / dy (x,t) {—piji + Ty"} dadt — i: by (L, t)Py'(L) dt 
t; J—L bi 
tf 
in / by(—L, t)Ty'(—L) dt. ot 
tj 


The equation of motion, which arises from the variation within the interval, 
is therefore the wave equation 


pj — Ty" =0. (1.138) 


The boundary conditions, which come from the variations at the endpoints, 
are 


y (L,t) = y'(—L, t) =0, (1.139) 


at all times t. These are the physically correct boundary conditions, because 
any up-or-down component of the tension would provide a finite force on an 
infinitesimal mass. The string must therefore be horizontal at its endpoints. 
Example: Bead and String. Suppose now that a bead of mass M is free to 
slide up and down the y axis, 


ha 


(0) 


0 L 


Figure 1.11: A bead connected to a string. 


and is is attached to the x = 0 end of our string. The Lagrangian for the 
string-bead contraption is 


1 sal 1 
L = <=M{y(0)]? + | {30 = sry" dx. (1.140) 
D 5-2 > 
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Here, as before, p is the mass per unit length of the string and T is its tension. 
The end of the string at x = L is fixed. By varying the action S = f Ldt, 
and taking care not to throw away the boundary part at x = 0 we find that 


tf ty pl 
0S = / [Ty’ — My], Sy(0, t) dt + / | {Ty" — pij} dy(a, t) dxdt. 
tj tj 0 


(1.141) 
The Euler-Lagrange equations are therefore 


py(z) —Ty"(«) = 0, O<2< ZL, 
My(0)—-Ty'(0) = 0, y(L)=0. (1.142) 


The boundary condition at x = 0 is the equation of motion for the bead. It 
is clearly correct, because T'y’(0) is the vertical component of the force that 
the string tension exerts on the bead. 

These examples led to boundary conditions that we could easily have 
figured out for ourselves without the variational principle. The next exam- 
ple shows that a variational formulation can be exploited to obtain a set of 
boundary conditions that might be difficult to write down by purely “physi- 
cal” reasoning. 


ay Py 


Figure 1.12: Gravity waves on water. 


Harder example: Gravity waves on the surface of water. An action suitable 
for describing water waves is given by? S[¢,h] = f Ldt, where 


h(a,t) dd 1 : 
LS [ef Po {z - ao +o} dy. (1.143) 


2J. C. Luke, J. Fluid Dynamics, 27 (1967) 395. 
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Here ¢ is the velocity potential and po is the density of the water. The density 
will not be varied because the water is being treated as incompressible. As 
before, the flow velocity is given by v = V@. By varying ¢(z,y,t) and the 
depth h(x,t), and taking care not to throw away any integrated-out parts of 
the variation at the physical boundaries, we obtain: 


V’?¢ = 0, within the fluid. 


apt 5 (V4) +gy = 0, on the free surface. 
7 = 0, on y=0. 
Oh 06 , OhO@ 
ee = he f face. 1.144 
i ey = oa 0, on the free surface ( ) 


The first equation comes from varying @ within the fluid, and it simply 
confirms that the flow is incompressible, i.e. obeys divv = 0. The second 
comes from varying h, and is the Bernoulli equation stating that we have 
P = Py (atmospheric pressure) everywhere on the free surface. The third, 
from the variation of ¢ at y = 0, states that no fluid escapes through the 
lower boundary. 

Obtaining and interpreting the last equation, involving 0h/0t, is some- 
what trickier. It comes from the variation of ¢ on the upper boundary. The 
variation of S' due to 0¢ is 


6S = fv {= 6g+ — (0°52) +5 5 (3052) — d@ viel dtdxdy. 


(1.145) 
The first three terms in the integrand constitute the three-dimensional di- 
vergence div (6¢ ®), where, listing components in the order t, x, y, 


db 06 
6 = [. on Se. (1.146) 


The integrated-out part on the upper surface is therefore [(®-n)d¢d|S]. 
Here, the outward normal is 


= (4 (2)? 4 (HY) 2%, 08 can 
7 Ot Ox Ob ag" 
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and the element of area 


ah\? fan\2\"" 
d|S| = (14 (s) 4 (=) ) dtda. (1.148) 


The boundary variation is thus 


Oh Od  OhOd 
p= — = dt. 1.14 
bS|y=p KS Dy + a so} 5o(2, h(a, t),t) dxdt (1.149) 


Requiring this variation to be zero for arbitrary 6@ (x, hilaet). t) leads to 


dh 6 | hdd _, 


an Gee = (1.150) 


This last boundary condition expresses the geometrical constraint that the 
surface moves with the fluid it bounds, or, in other words, that a fluid particle 
initially on the surface stays on the surface. To see that this is so, define 
f(x,y,t) = h(a,t) — y. The free surface is then determined by f(z,y,t) = 
0. Because the surface particles are carried with the flow, the convective 
derivative of f, 


_ Of 
a= +(v- Vf, (1.151) 


must vanish on the free surface. Using v = V@ and the definition of f, this 
reduces to 


dh O60h 86 
pote pa tect I A eo 1.152 
Bi Ob Oe. On ne) 


which is indeed the last boundary condition. 
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1.5 Lagrange multipliers 


(S 


Figure 1.13: Road on hill. 


Figure 1.13 shows the contour map of a hill of height h = f(x,y). The 
hill traversed by a road whose points satisfy the equation g(x,y) = 0. Our 
challenge is to use the data h(x,y) and g(x,y) to find the highest point on 
the road. 

When r changes by dr = (dx, dy), the height f changes by 


df = Vf -dr, (1.153) 


where Vf = (0,f,0,f). The highest point, being a stationary point, will 
have df = 0 for all displacements dr that stay on the road — that is for 
all dr such that dg = 0. Thus Vf - dr must be zero for those dr such that 
0 = Vqg-dr. In other words, at the highest point Vf will be orthogonal to 
all vectors that are orthogonal to Vg. This is possible only if the vectors V f 
and Vg are parallel, and so Vf = AVg for some 4. 

To find the stationary point, therefore, we solve the equations 


Vf-AVg = 0, 
g(o,y) = 0, (1.154) 


simultaneously. 
Example: Let f = 27+ y? and g = x+y-—1. Then Vf = 2(z,y) and 
Vg = (1,1). So 


Any) ALT) = 0 > (ey) = 501) 


1.5. LAGRANGE MULTIPLIERS 37 


11 
3, Al SM > <a) = (—e—)e 
ety = 1, (0,9) = G5) 
When there are n constraints, g; = g2 =--: = gn = 0, we want Vf to lie 
in 
(Ves )-S= VE > (1.155) 


where < e; > denotes the space spanned by the vectors e; and < e; >+ is 
the its orthogonal complement. Thus Vf lies in the space spanned by the 
vectors Vg;, so there must exist n numbers A; such that 


VE= SoG: (1.156) 
4=1 


The numbers 4; are called Lagrange multipliers. We can therefore regard our 
problem as one of finding the stationary points of an auxilliary function 


P=f = S- Xi9i, (1.157) 
with the n undetermined multipliers A;,2 = 1,...,n, subsequently being fixed 
by imposing the n requirements that g; = 0,27 =1,...,n. 


Example: Find the stationary points of 
it 


on the surface x-x = 1. Here A;; is a symmetric matrix. 
Solution: We look for stationary points of 


G(x) = F(x) - Sh (1.159) 


The derivatives we need are 


OF t) 1 
ane 3 Oni Aig; on grids die 
= Aggies; (1.160) 
and a 
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Thus, the stationary points must satisfy 


Alig; Ey = ALk, 
ce =a (1.162) 


and so are the normalized eigenvectors of the matrix A. The Lagrange 
multiplier at each stationary point is the corresponding eigenvalue. 
Example: Statistical Mechanics. Let I’ denote the classical phase space of 
a mechanical system of n particles governed by Hamiltonian H(p,q). Let 
dY be the Liouville measure d?"p d?"q. In statistical mechanics we work 
with a probability density p(p,q) such that p(p,q)dT is the probability of 
the system being in a state in the small region dT’. The entropy associated 
with the probability distribution is the functional 


S|p| = - [ pmpar. (1.163) 


We wish to find the p(p,q) that maximizes the entropy for a given energy 


(E) = [on dV. (1.164) 


We cannot vary p freely as we should preserve both the energy and the 
normalization condition 


[ea =1 (1.165) 
Tt 


that is required of any probability distribution. We therefore introduce two 
Lagrange multipliers, 1 + a@ and (, to enforce the normalization and energy 
conditions, and look for stationary points of 


Flp| = [ -mp+ (o-+1)p~ 8H} dr. (1.166) 
r 
Now we can vary p freely, and hence find that 
oP = [ {- inp +a~ SH} bpd. (1.167) 
r 


Requiring this to be zero gives us 


Ag) Sere), (1.168) 
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where a, (@ are determined by imposing the normalization and energy con- 
straints. This probability density is known as the canonical distribution, and 
the parameter ( is the inverse temperature @ = 1/T. 

Example: The Catenary. At last we have the tools to solve the problem of 
the hanging chain of fixed length. We wish to minimize the potential energy 


Bly) = fowi + (y')?dz, (1.169) 


subject to the constraint 


ily] = ie VJ 1+ (y’)?dx = const., (1.170) 


where the constant is the length of the chain. We introduce a Lagrange 
multiplier \ and find the stationary points of 


Fly) = [ i/ie ede. (1.171) 


so, following our earlier methods, we find 


y=A+ «cosh — (172) 


We choose «,A,a to fix the two endpoints (two conditions) and the length 
(one condition). 

Example: Sturm-Liouville Problem. We wish to find the stationary points 
of the quadratic functional 


ul = f° 5 {elev + aleer} ae (1.173) 


subject to the boundary conditions y(x) = 0 at the endpoints 21,72 and the 
normalization 


Kly| = ie y dx =1. (1.174) 


an 


Taking the variation of J — (A/2)K, we find 


oJ = / {—(py’)' + ay — Ay} dy dx. (1.175) 
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Stationarity therefore requires 


—(py')’ +ay =Ay,  y(x1) = y(x2) = 0. (1.176) 


This is the Sturm-Liouville eigenvalue problem. It is an infinite dimensional 


analogue of the F(x) = $x- Ax problem. 


Example: Irrotational Flow Again. Consider the action functional 


Slv, ¢, p] = i {50 —u(p)+¢ (F ae tiv pw \ dtd? x (1.177) 


This is similar to our previous action for the irrotational barotropic flow of an 
inviscid fluid, but here v is an independent variable and we have introduced 
infinitely many Lagrange multipliers ¢(x, t), one for each point of space-time, 
so as to enforce the equation of mass conservation p+ div pv = 0 everywhere, 
and at all times. Equating 6S/dv to zero gives v = V4@, and so these Lagrange 
multipliers become the velocity potential as a consequence of the equations 
of motion. The Bernoulli and Euler equations now follow almost as before. 
Because the equation v = V@ does not involve time derivatives, this is 
one of the cases where it is legitimate to substitute a consequence of the 
action principle back into the action. If we do this, we recover our previous 
formulation. 


1.6 Maximum or minimum? 


We have provided many examples of stationary points in function space. We 
have said almost nothing about whether these stationary points are maxima 
or minima. There is a reason for this: investigating the character of the 
stationary point requires the computation of the second functional derivative. 
OJ 
dy(r1)dy (x2) 
and the use of the functional version of Taylor’s theorem to expand about 
the stationary point y(z): 


Jy-+en = Jl-+e f nla) 5 


ee Ons 
5) n(r1) (x2) dy(a)oy(aa) 


dx \dx%y+---. 
y 


(1.178) 
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Since y(x) is a stationary point, the term with 6J/dy(zx)|, vanishes. Whether 
y(x) is a maximum, a minimum, or a saddle therefore depends on the number 
of positive and negative eigenvalues of 67J/5(y(x1))6(y(x2)), a matrix with 
a continuous infinity of rows and columns—these being labeled by x; and 
X_ repectively. It is not easy to diagonalize a continuously infinite matrix! 
Consider, for example, the functional 


Jt) = [5 {ele )(u)? + alan? ae (1.179) 


with y(a) = y(b) = 0. Here, as we already know, 


5 = Ww =-F (P@Eule)) +alevle), (1.180) 


and, except in special cases, this will be zero only if y(x) = 0. We might 
reasonably expect the second derivative to be 


5 . 
—(Ly) = L 1.181 
ml y= L, (1.181) 


where L is the Sturm-Liouville differential operator 


L= + (=) + q(x). (1.182) 


How can a differential operator be a matrix like 6?J/5(y(x1))6(y(x2))? 
We can formally compute the second derivative by exploiting the Dirac 
delta “function” 6(x) which has the property that 


y(@2) = [02 — £1)y(x1) dxy. (1.183) 


Thus 
dy (x2) = [02 — £1)dy(x1) dx, (1.184) 


from which we read off that 
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Using (1.185), we find that 


) OJ d d 
—_ ( —) =-= (42) ales n)) Sy oNb Gy GEO) 
How are we to make sense of this expression? We begin in the next chapter 
where we explain what it means to differentiate d(2), and show that (1.186) 
does indeed correspond to the differential operator L. In subsequent chap- 
ters we explore the manner in which differential operators and matrices are 
related. We will learn that just as some matrices can be diagonalized so can 
some differential operators, and that the class of diagonalizable operators 
includes (1.182). 

If all the eigenvalues of L are positive, our stationary point was a min- 
imum. For each negative eigenvalue, there is direction in function space in 
which J|y] decreases as we move away from the stationary point. 


1.7 Further exercises and problems 


Here is a collection of problems relating to the calculus of variations. Some 
date back to the 16th century, others are quite recent in origin. 


Exercise 1.1: A smooth path in the z-y plane is given by r(t) = (x(t), y(t)) 
with r(0) =a, and r(1) = b. The length of the path from a to b is therefore. 


1 
S{r] = | Va? + y? dt, 
0 


where « = dx/dt, y = dy/dt. Write down the Euler-Lagrange conditions for 
Sr] to be stationary under small variations of the path that keep the endpoints 
fixed, and hence show that the shortest path between two points is a straight 
line. 


Exercise 1.2: Fermat’s principle. A medium is characterised optically by 
its refractive index n, such that the speed of light in the medium is c/n. 
According to Fermat (1657), the path taken by a ray of light between any 
two points makes stationary the travel time between those points. Assume 
that the ray propagates in the x,y plane in a layered medium with refractive 
index n(x). Use Fermat’s principle to establish Snell’s law in its general form 
n(x) sin = constant by finding the equation giving the stationary paths y(z) 


for 
Fily| = [roy 1+ y'dax. 
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(Here the prime denotes differentiation with respect to x.) Repeat this exercise 
for the case that n depends only on y and find a similar equation for the 


stationary paths of 
Fly) = f n(yy/1+ ya, 


By using suitable definitions of the angle of incidence w in each case, show 
that the two formulations of the problem give physically equivalent answers. 
In the second formulation you will find it easiest to use the first integral of 
Euler’s equation. 


Problem 1.3: Hyperbolic Geometry. This problem introduces a version of the 
Poincaré model for the non-Euclidean geometry of Lobachevski. 


a) Show that the stationary paths for the functional 


1 
F3[y| = fey 1+ydz, 


with y(x) restricted to lying in the upper half plane are semi-circles of 
arbitrary radius and with centres on the x axis. These paths are the 
geodesics, or minimum length paths, in a space with Riemann metric 


1 
ds” = pide” + dy’), y>0. 


b) Show that if we call these geodesics “lines”, then one and only one line 
can be drawn though two given points. 

c) Two lines are said to be parallel if, and only if, they meet at “infinity”, 
i.e. on the x axis. (Verify that the x axis is indeed infinitely far from any 
point with y > 0.) Show that given a line gq and a point A not lying on 
that line, that there are two lines passing through A that are parallel to 
q, and that between these two lines lies a pencil of lines passing through 
A that never meet gq. 


Problem 1.4: Elastic Rods. The elastic energy per unit length of a bent steel 
rod is given by sYI /R?. Here R is the radius of curvature due to the bending, 
Y is the Young’s modulus of the steel and J = ff y°dady is the moment 
of inertia of the rod’s cross section about an axis through its centroid and 
perpendicular to the plane in which the rod is bent. If the rod is only slightly 
bent into the yz plane and lies close to the z axis, show that this elastic energy 
can be approximated as 


44 
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where the prime denotes differentiation with respect to z and L is the length 
of the rod. We will use this approximate energy functional to discuss two 
practical problems. 


2 
7, A 


a) b) 


Figure 1.14: A rod used as: a) a column, b) a cantilever. 


a) Euler’s problem: the buckling of a slender column. The rod is used as 


a column which supports a compressive load Mg directed along the z 
axis (which is vertical). Show that when the rod buckles slighly (i.e. 
deforms with both ends remaining on the z axis) the total energy, in- 
cluding the gravitational potential energy of the loading mass M, can be 
approximated by 


ou= f (ZnSe wy} a. 


By considering small deformations of the form 
_ MTZ 
q(2)= » Gp, sin aC 

n= 


show that the column is unstable to buckling and collapse if Mg > 
mY 1/7. 
Leonardo da Vinci’s problem: the light cantilever. Here we take the z 
axis as horizontal and the y axis as being vertical. The rod is used as 
a beam or cantilever and is fixed into a wall so that y(0) = 0 = y’(0). 
A weight Mg is hung from the end z = LE and the beam sags in the —y 
direction. We wish to find y(z) for 0 < z < L. We will ignore the weight 
of the beam itself. 

e Write down the complete expression for the energy, including the 

gravitational potential energy of the weight. 
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e Find the differential equation and boundary conditions at z = 0, L 
that arise from minimizing the total energy. In doing this take care 
not to throw away any term arising from the integration by parts. 
You may find the following identity to be of use: 


Sra! fg") = fig" _ rae 


e Solve the equation. You should find that the displacement of the 
end of the beam is y(L) = —$MgL?/YI. 


Exercise 1.5: Suppose that an elastic body 2 of density p is slightly deformed 
so that the point that was at cartesian co-ordinate x; is moved to x; + 7;(2). 
We define the resulting strain tensor e;; by 


1 (On; | On; 
ey == 15 4 Un 
2 Ox; On; 
It is automatically symmetric in its indices. The Lagrangian for small-amplitude 
elastic motion of the body is 


1. 1 
L{n] = {ser = sescuucns} dx. 
Q 


Here, cjjx; is the tensor of elastic constants, which has the symmetries 


Cijkl = Cklij = Cjikl = Cijlk- 
By varying the 7;, show that the equation of motion for the body is 


On _O 
PAR Ox; 


where 
Oj = CijklEkl 


is the stress tensor. Show that variations of 7; on the boundary OQ give as 
boundary conditions 


OVjNj = 0, 


where n; are the components of the outward normal on 02. 
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Figure 1.15: Weighted line. 


Problem 1.6:The catenary revisited. We can describe a catenary curve in 
parametric form as x(s), y(s), where s is the arc-length. The potential en- 
ergy is then simply i pgy(s)ds where p is the mass per unit length of the 
hanging chain. The x, y are not independent functions of s, however, because 
z? + y? =1 at every point on the curve. Here a dot denotes a derivative with 
respect to s. 


a) Introduce infinitely many Lagrange multipliers A(s) to enforce the 4? + ¥? 


constraint, one for each point s on the curve. From the resulting func- 
tional derive two coupled equations describing the catenary, one for x(s) 
and one for y(s). By thinking about the forces acting on a small section 
of the cable, and perhaps by introducing the angle w where « = cos w and 
y = siny, so that s and w are intrinsic coordinates for the curve, inter- 
pret these equations and show that \(s) is proportional to the position- 
dependent tension T'(s) in the chain. 

b) You are provided with a light-weight line of length 7a/2 and some lead 
shot of total mass M. By using equations from the previous part (suit- 
ably modified to take into account the position dependent p(s)) or oth- 
erwise, determine how the lead should be distributed along the line if the 
loaded line is to hang in an arc of a circle of radius a (see figure 1.15) 
when its ends are attached to two points at the same height. 


Problem 1.7: Another model for Lobachevski geometry (see exercise 1.3) 
is the Poincaré disc. This space consists of the interior of the unit disc 
D? = {(a,y) € R?: 2 + y? < 1} equipped with Riemann metric 


dx? + dy” 


2 
ds* = Ga a2 
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The geodesic paths are found by minimizing the arc-length functional 


sirl= [as= [| —a—avF v8 Sa 


1—ag?-y 


where r(t) = (a(t), y(t)) and a dot indicates a derivative with respect to the 
parameter f. 


Figure 1.16: The Poincaré disc of exercise 1.7. The radius OP of the Poincare 
disc is unity, while the radius of the geodesic arc PQR is PX = QX = RX 
R. The distance between the centres of the disc and arc is OX = x9. Your 
task in part c) is to show that ZOPX = ZORX = 90°. 


a) Either by manipulating the two Euler-Lagrange equations that give the 
conditions for s[r] to be stationary under variations in r(t), or, more effi- 
ciently, by observing that s[r] is invariant under the infinitesimal rotation 


ou = ey 
oy = Ex 


and applying Noether’s theorem, show that the parameterised geodesics 


obey 
d 1 LY — yk _¢ 
dt 1—2?—-y? x2 + yj aes 


b) Given a point (a,b) within D?, and a direction through it, show that 
the equation you derived in part a) determines a unique geodesic curve 
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passing through (a,b) in the given direction, but does not determine the 
parametrization of the curve. 
c) Show that there exists a solution to the equation in part a) in the form 
a(t) = Rcost+2o 
y(t) = Rsint. 
Find a relation between x9 and R, and from it deduce that the geodesics 


are circular arcs that cut the bounding unit circle (which plays the role 
of the line at infinity in the Lobachevski plane) at right angles. 


Exercise 1.8: The Lagrangian for a particle of charge q is 


Else] = smn? — q(x) + gx - A(x). 


Show that Lagrange’s equation leads to 
mx = q(E+xx B), 


where hk 
E=—-Vo@- Wr’ B=curlA. 


Exercise 1.9: Consider the action functional 
1 1 1 . 
Slw,p,r] = i (net + 52H + 51303 +p:(r+wx } dt, 


where r and p are time-dependent three-vectors, as is w = (w1,w2,w3), Apply 
the action principle to obtain the equations of motion for r,p,w and show 
that they lead to Euler’s equations 


Iho, — a — I3)wow3 = 0, 


Inw2 — (I3 — h)w3u, = 0, 


T3w3 — (hh — In)wiw2 = 
governing the angular velocity of a freely-rotating rigid body. 


Problem 1.10: Piano String. A elastic piano string can vibrate both trans- 
versely and longitudinally, and the two vibrations influence one another. A 
Lagrangian that takes into account the lowest-order effect of stretching on the 
local string tension, and can therefore model this coupled motion, is 


oe\2 2 d a 9 972 
veni= ful in|() (2) ] 38-28) 
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Ay 


Figure 1.17: Vibrating piano string. 


Here €(x,t) is the longitudinal displacement and (x,t) the transverse dis- 
placement of the string. Thus, the point that in the undisturbed string had 
co-ordinates [x, 0] is moved to the point with co-ordinates [x + €(x,t), n(a, t)]. 
The parameter 7) represents the tension in the undisturbed string, A is the 
product of Young’s modulus and the cross-sectional area of the string, and po 
is the mass per unit length. 


a) Use the action principle to derive the two coupled equations of motion, 
2 2 


; : ; . On 

one involving a2 and one involving ok 
b) Show that when we linearize these two equations of motion, the longi- 
tudinal and transverse motions decouple. Find expressions for the lon- 
gitudinal (cz) and transverse (cr) wave velocities in terms of To, p9 and 
X. 
Assume that a given transverse pulse n(xz,t) = no(a — crt) propagates 
along the string. Show that this induces a concurrent longitudinal pulse 
of the form €(a — crt). Show further that the longitudinal Newtonian 
momentum density in this concurrent pulse is given by 


io) 
Nw 


Og 1 oe 70 


where 


is the associated pseudo-momentum density. 


The forces that created the transverse pulse will also have created other lon- 
gitudinal waves that travel at cz. Consequently the Newtonian z-momentum 
moving at cr is not the only z-momentum on the string, and the total “true” 
longitudinal momentum density is not simply proportional to the pseudo- 
momentum density. 
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Exercise 1.11: Obtain the canonical energy-momentum tensor T”,, for the 
barotropic fluid described by (1.119). Show that its conservation leads to both 
the momentum conservation equation (1.128), and to the energy conservation 
equation 


OvE + O.{vi(E + P)}, 


where the energy density is 


£ = 50(Vo)? + ule). 


Interpret the energy flux as being the sum of the convective transport of energy 
together with the rate of working by an element of fluid on its neighbours. 


Problem 1.12: Consider the action functional® 
1 O . 6} 
Slv, p,, 2,71 = pes {-30" —¢ G + div (ov)) + pp (2 ave: v7) +u(o)} 


which is a generalization of (1.177) to include two new scalar fields 3 and 7. 
Show that varying v leads to 


v=Vo+ BV. 


This is the Clebsch representation of the velocity field. It allows for flows with 
non-zero vorticity 
w=curlv = V6 x Vy. 


Show that the equations that arise from varying the remaining fields p, ¢, (, 
y, together imply the mass conservation equation 


e + div (pv) = 0, 


and Bernoulli’s equation in the form 


wx v=-V (iv +n). 
(Recall that h = du/dp.) Show that this form of Bernoulli’s equation is 


equivalent to Euler’s equation 


Ov 
OE +(v-V)v =—-VA. 


Consequently S provides an action principle for a general inviscid barotropic 


flow. 


3H. Bateman, Proc. Roy. Soc. Lond. A 125 (1929) 598-618; C. C. Lin, Liquid Helium 
in Proc. Int. Sch. Phys. “Enrico Fermi”, Course XXI (Academic Press 1965). 
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Exercise 1.13: Drums and Membranes. The shape of a distorted drumskin is 
described by the function h(x, y), which gives the height to which the point 
(x,y) of the flat undistorted drumskin is displaced. 


a) Show that the area of the distorted drumskin is equal to 


2 2 
Area|h] = f avay 1+ (5) + ($) ; 


where the integral is taken over the area of the flat drumskin. 
b) Show that for small distortions, the area reduces to 


1 
Al[h] = const. + 5 fu dy |Vh|?. 


c) Show that if h satisfies the two-dimensional Laplace equation then A is 
stationary with respect to variations that vanish at the boundary. 

d) Suppose the drumskin has mass fo per unit area, and surface tension T’. 
Write down the Lagrangian controlling the motion of the drumskin, and 
derive the equation of motion that follows from it. 


Problem 1.14: The Wulff construction. The surface-area functional of the 
previous exercise can be generalized so as to find the equilibrium shape of a 
crystal. We describe the crystal surface by giving its height z(x,y) above the 
x-y plane, and introduce the direction-dependent surface tension (the surface 
free-energy per unit area) a(p,q), where 


OE get Ok (x) 


We seek to minimize the total surface free energy 


Flz|= [ exay {a(p, qv1tp?+ ?} 


subject to the constraint that the volume of the crystal 


Viel= [ vavdy 


remains constant. 


a) Enforce the volume constraint by introducing a Lagrange multiplier 2\~', 
and so obtain the Euler-Lagrange equation 


Of OPN) Oop OF Ve es 
ia (Ge) + ay (Ge) = | 
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Here 
f(p,@) = a(p.q)V1+4 p? + ¢?. 


b) Show in the isotropic case, where a is constant, that 


z(x,y) = V/ (ad)? — (x — a)? — (y — 6)? + const. 


is a solution of the Euler-Lagrange equation. In this case, therefore, the 
equilibrium shape is a sphere. 


An obvious way to satisfy the Euler-Lagrange equation in the general anisotropic 
case would be to arrange things so that 


of Of 
rae ta = 


c) Show that (««) is exactly the relationship we would have if z(x,y) and 
Af (p,q) were Legendre transforms of each other—i.e. if 


Af (p,q) = px + ay — 2(z,y), 


where the x and y on the right-hand side are functions of p q obtained 
by solving (x). Do this by showing that the inverse relation is 


z(x,y) = px + qy — Af (p, 4) 


where now the p, g on the right-hand side become functions of x and y, 
and are obtained by solving (xx). 


For real crystals, a(p,q) can have the property of a being a continuous-but- 
nowhere-differentiable function, and so the differential calculus used in deriv- 
ing the Euler-Lagrange equation is inapplicable. The Legendre transformation, 
however, has a geometric interpretation that is more robust than its calculus- 
based derivation. 


Recall that if we have a two-parameter family of surfaces in R*® given by 
F (x,y, z;p,q) = 0, then the equation of the envelope of the surfaces is found 
by solving the equations 

LOR OF 

~ Op aq 


so as to eliminate the parameters p, q. 


0O=F 
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d) Show that the equation 


F(x, y, 2;p,q) = px + qy — z— Aa(p,q)V 14+ p2+¢? =0 


describes a family of planes perpendicular to the unit vectors 


(p, q; =1) 


Jlt+p?+¢ 


and at a distance \a(p,q) away from the origin. 

e) Show that the equations to be solved for the envelope of this family of 
planes are exactly those that determine z(x,y). Deduce that, for smooth 
a(p,q), the profile z(x,y) is this envelope. 


y 


a) 


Figure 1.18: Two-dimensional Wulff crystal. a) Polar plot of surface tension 
a as a function of the normal n to a crystal face, together with a line per- 
pendicular to n at distance a from the origin. b) Wulff’s construction of the 
corresponding crystal surface as the envelope of the family of perpendicular 
lines. In this case, the minimum-energy crystal has curved faces, but sharp 
corners. The envelope continues beyond the corners, but these parts are 
unphysical. 


Wulff conjectured’ that, even for non-smooth a(p,q), the minimum-energy 
shape is given by an equivalent geometric construction: erect the planes from 
part d) and, for each plane, discard the half-space of R? that lies on the far side 
of the plane from the origin. The convex region consisting of the intersection 
of the retained half-spaces is the crystal. When a(p,q) is smooth this “Wulff 


4G. Wulff, “Zur frage der geschwindigkeit des wachsturms under auflosung der 
kristallflachen,” Zeitschrift ftir Kristallografie, 34 (1901) 449-530. 
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body” is bounded by part of the envelope of the planes. (The parts of the 
envelope not bounding the convex body—the “swallowtails” visible in figure 
1.18—are unphysical.) When a(p.q) has cusps, these singularities can give 
rise to flat facets which are often joined by rounded edges. A proof of Wulff’s 
claim had to wait until 43 years until 1944, when it was established by use of 
the Brunn-Minkowski inequality.° 


°A. Dinghas, “Uber einen geometrischen Satz von Wulff fiir die Gleichgewichtsform 
von Kristallen, Zeitshrift fiir Kristallografie, 105 (1944) 304-314. For a readable modern 
account see: R. Gardner, “The Brunn-Minkowski inequality,” Bulletin Amer. Math. Soc. 
39 (2002) 355-405. 


Chapter 2 


Function Spaces 


Many differential equations of physics are relations involving linear differ- 
ential operators. These operators, like matrices, are linear maps acting on 
vector spaces. The new feature is that the elements of the vector spaces are 
functions, and the spaces are infinite dimensional. We can try to survive 
in these vast regions by relying on our experience in finite dimensions, but 
sometimes this fails, and more sophistication is required. 


2.1 Motivation 


In the previous chapter we considered two variational problems: 
1) Find the stationary points of 


1 1 
E(x) ax Ax = titist (2.1) 
on the surface x -x = 1. This led to the matrix eigenvalue equation 
Ase XK, (2.2) 
2) Find the stationary points of 
Pal 
Jt) = [5 elev)? + alan? ae (2.3) 


subject to the conditions y(a) = y(b) = 0 and 
b 
Ky] = / y? dx =1. (2.4) 


59 
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This led to the differential equation 


—(py')' + ay = Ay, (a) = y(b) = 0. (2.5) 


There will be a solution that satisfies the boundary conditions only for 

a discrete set of values of A. 
The stationary points of both function and functional are therefore deter- 
mined by linear eigenvalue problems. The only difference is that the finite 
matrix in the first is replaced in the second by a linear differential operator. 
The theme of the next few chapters is an exploration of the similarities and 
differences between finite matrices and linear differential operators. In this 
chapter we will focus on how the functions on which the derivatives act can 
be thought of as vectors. 


2.1.1 Functions as vectors 


Consider Fa, b], the set of all real (or complex) valued functions f(x) on the 
interval [a,b]. This is a vector space over the field of the real (or complex) 
numbers: Given two functions f\(a) and f2(a), and two numbers \; and 2, 
we can form the sum \, f(x) + A2fo(x) and the result is still a function on the 
same interval. Examination of the axioms listed in appendix A will show that 
F a,b] possesses all the other attributes of a vector space as well. We may 
think of the array of numbers (f(x)) for x € [a,b] as being the components 
of the vector. Since there is an infinity of independent components — one 
for each point x — the space of functions is infinite dimensional. 

The set of all functions is usually too large for us. We will restrict our- 
selves to subspaces of functions with nice properties, such as being continuous 
or differentiable. There is some fairly standard notation for these spaces: The 
space of C” functions (those which have n continuous derivatives) is called 
Ca, b]. For smooth functions (those with derivatives of all orders) we write 
C\a,b]. For the space of analytic functions (those whose Taylor expansion 
actually converges to the function) we write C’[a, b]. For C™ functions de- 
fined on the whole real line we write C™(R). For the subset of functions 
with compact support (those that vanish outside some finite interval) we 
write Cg°(R). There are no non-zero analytic functions with compact sup- 
port: C7 (IR) = {0}. 
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2.2 Norms and inner products 


We are often interested in “how large” a function is. This leads to the idea of 
normed function spaces. There are many measures of function size. Suppose 
R(t) is the number of inches per hour of rainfall. If your are a farmer you 
are probably most concerned with the total amount of rain that falls. A big 
rain has big { |R(t)| dt. If you are the Urbana city engineer worrying about 
the capacity of the sewer system to cope with a downpour, you are primarily 
concerned with the maximum value of R(t). For you a big rain has a big 
‘sup |R(é)|.”* 


2.2.1 Norms and convergence 


We can seldom write down an exact solution function to a real-world problem. 
We are usually forced to use numerical methods, or to expand as a power 
series in some small parameter. The result is a sequence of approximate 
solutions f,(2), which we hope will converge to the desired exact solution 
f(x) as we make the numerical grid smaller, or take more terms in the power 
series. 

Because there is more than one way to measure of the “size” of a function, 
the convergence of a sequence of functions f,, to a limit function f is not as 
simple a concept as the convergence of a sequence of numbers x, to a limit x. 
Convergence means that the distance between the f, and the limit function 
f gets smaller and smaller as n increases, so each different measure of this 
distance provides a new notion of what it means to converge. We are not go- 
ing to make much use of formal “e, 6” analysis, but you must realize that this 
distinction between different forms of convergence is not merely academic: 
real-world engineers must be precise about the kind of errors they are pre- 
pared to tolerate, or else a bridge they design might collapse. Graduate-level 
engineering courses in mathematical methods therefore devote much time to 
these issues. While physicists do not normally face the same legal liabilities 
as engineers, we should at least have it clear in our own minds what we mean 
when we write that f, — f. 


'Here “sup,” short for supremum, is synonymous with the “least upper bound” of a 
set of numbers, 7.e. the smallest number that is exceeded by no number in the set. This 
concept is more useful than “maximum” because the supremum need not be an element 
of the set. It is an axiom of the real number system that any bounded set of real numbers 
has a least upper bound. The “greatest lower bound” is denoted “inf”, for infimum. 
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Here are some common forms of convergence: 
i) If, for each x in its domain of definition D, the set of numbers f,(x) 
converges to f(a), then we say the sequence converges pointwise. 
ii) If the maximum separation 


sup | fn(2) — f(2)| (2.6) 
xED 
goes to zero as n — oo, then we say that f, converges to f uniformly 
on D. 
iii) If 
| lla) = fe) ae (2.7) 
D 


goes to zero as n — oo, then we say that f,, converges in the mean to 
f on D. 
Uniform convergence implies pointwise convergence, but not vice versa. If 
D is a finite interval, then uniform convergence implies convergence in the 
mean, but convergence in the mean implies neither uniform nor pointwise 
convergence. 
Example: Consider the sequence f, = x” (n = 1, 2,...) and D = (0,1). 
Here, the round and square bracket notation means that the point 7 = 0 is 
included in the interval, but the point 1 is excluded. 


Figure 2.1: 2” — 0 on [0,1), but not uniformly. 


As n becomes large we have 7” — 0 pointwise in D, but the convergence is 


not uniform because 


sup |x” — 0] = 1 (2.8) 
xED 


for all n. 
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Example: Let f, = 2” with D = [0,1]. Now the the two square brackets 
mean that both « = 0 and x = 1 are to be included in the interval. In this 
case we have neither uniform nor pointwise convergence of the x” to zero, 
but 2” — 0 in the mean. 

We can describe uniform convergence by means of a norm — a general- 
ization of the usual measure of the length of a vector. A norm, denoted by 
| f||, of a vector f (a function, in our case) is a real number that obeys 

i) positivity: || f|| > 0, and ||/f|| =O0< f =0, 

ii) the triangle inequality: ||f + g|| < \|f\l + Ilgll, 
iii) linear homogeneity: ||Af|] = |Al|| fl. 
One example is the “sup” norm, which is defined by 


Il flloo = sup |f(2)|- (2.9) 
xED 


This number is guaranteed to be finite if f is continuous and D is compact. 
In terms of the sup norm, uniform convergence is the statement that 


Jim [fn — flloo = 0. (2.10) 


2.2.2 Norms from integrals 


The space La, b], for any 1 < p < ov, is defined to be our F'{a, b] equipped 


with ; tis 
Ith=(f utara) (2.11) 


as the measure of length, and with a restriction to functions for which || ||, 
is finite. 

We say that f, — f in L? if the L? distance || f — f,||, tends to zero. We 
have already seen the L! measure of distance in the definition of convergence 
in the mean. As in that case, convergence in L” says nothing about pointwise 
convergence. 

We would like to regard ||f||, as a norm. It is possible, however, for a 
function to have ||f\||, = 0 without f being identically zero — a function 
that vanishes at all but a finite set of points, for example. This pathology 
violates number i) in our list of requirements for something to be called a 
norm, but we circumvent the problem by simply declaring such functions to 
be zero. This means that elements of the L? spaces are not really functions, 
but only equivalence classes of functions — two functions being regarded as 
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the same is they differ by a function of zero length. Clearly these spaces are 
not for use when anything significant depends on the value of the function 
at any precise point. They are useful in physics, however, because we can 
never measure a quantity at an exact position in space or time. We usually 
measure some sort of local average. 

The L” norms satisfy the triangle inequality for all 1 < p < o, although 
this is not exactly trivial to prove. 

An important property for any space to have is that of being complete. 
Roughly speaking, a space is complete if when some sequence of elements of 
the space look as if they are converging, then they are indeed converging and 
their limit is an element of the space. To make this concept precise, we need 
to say what we mean by the phrase “look as if they are converging.” This 
we do by introducing the idea of a Cauchy sequence. 

Definition: A sequence f,, in a normed vector space is Cauchy if for any ¢ > 0 
we can find an N such that n,m > N implies that || fm — fn|| < e. 

This definition can be loosely paraphrased to say that the elements of a 
Cauchy sequence get arbitrarily close to each other as n — oo. 

A normed vector space is complete with respect to its norm if every 
Cauchy sequence actually converges to some element in the space. Consider. 
for example, the normed vector space Q of rational numbers with distance 


measured in the usual way as ||q, — q2|| = |q1 — q2|. The sequence 
do = 1.0, 
1 => 1.4, 
a2 = 1.41, 
ga = 1.414, 


consisting of successive decimal approximations to V2, obeys 


I 


1Q min(n,m) (2.12) 


ldn <i Gm = 


and so is Cauchy. Pythagoras famously showed that V2 is irrational, however, 
and so this sequence of rational numbers has no limit in Q. Thus Q is not 
complete. The space R of real numbers is constructed by filling in the gaps 
between the rationals, and so completing Q. A real number such as 2 
is defined as a Cauchy sequence of rational numbers (by giving a rule, for 
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example, that determines its infinite decimal expansion), with two rational 
sequences g, and qi, defining the same real number if g, — qj, converges to 
Zero. 

A complete normed vector space is called a Banach space. If we interpret 
the norms as Lebesgue integrals” then the L? [a,b] are complete, and therefore 
Banach spaces. The theory of Lebesgue integration is rather complicated, 
however, and is not really necessary. One way of avoiding it is explained in 
exercise 2.2. 


Exercise 2.1: Show that any convergent sequence is Cauchy. 


2.2.3 Hilbert space 


The Banach space L?[a,b| is special in that it is also a Hilbert space. This 
means that its norm is derived from an inner product. If we define the inner 
product 


b 
U9) = f Frode (2.13) 
then the L?{a, 6] norm can be written 


IIflle= VAF A). (2.14) 


When we omit the subscript on a norm, we mean it to be this one. You 
are probably familiar with this Hilbert space from your quantum mechanics 
classes. 

Being positive definite, the inner product satisfies the Cauchy-Schwarz- 
Bunyakovusky inequality 


K(f 9)1 < I FIllgll- (2.15) 
That this is so can be seen by observing that 
2 

aftagrrtua =O) (ME eC) eto 


must be non-negative for any choice of X and py. We therefore select A = ||g||, 
pu = —(f,g)'|\g\|~1, in which case the non-negativity of (2.16) becomes the 
statement that 


IFIFllglP? - KAP = 0. (2.17) 


?The “L” in L? honours Henri Lebesgue. Banach spaces are named after Stefan Banach, 
who was one of the founders of functional analysis, a subject largely developed by him 
and other habitués of the Scottish Café in Lvév, Poland. 
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From Cauchy-Schwarz-Bunyakovsky we can establish the triangle inequal- 
ity: 


If+gll = [IFIP + llgll? + 2Re(f, 9) 
< WFP + lll? + 21¢f, 91, 
< Fil + Igll? + 21 Fiillgll, 
= (Ifil+llall)’, (2.18) 
sO 
If + gll < FI + Ilgll- (2.19) 


A second important consequence of Cauchy-Schwarz-Bunyakovsky is that 
if f, — f in the sense that ||f, — f|| — 0, then 


lfns 9) —(f,9)1 = Mtn —- Ff), 9) 
&- |lfa= Fk gl (2.20) 


tends to zero, and so 
(fn 9) > (f,9). (2.21) 


This means that the inner product (f,g) is a continuous functional of f and 
g. Take care to note that this continuity hinges on ||g|| being finite. It is for 
this reason that we do not permit ||g|| = oo functions to be elements of our 
Hilbert space. 


Orthonormal sets 


Once we are in possession of an inner product, we can introduce the notion 
of an orthonormal set. A set of functions {u,,} is orthonormal if 


(Un; Um) = Onm: (2.22) 


For example, 
1 
2 | sin(nre) sin@iTT) dt SOs: Nana, 2b (2.23) 
0 


so the set of functions u, = V2sinnrz is orthonormal on [0,1]. This set of 
functions is also complete — in a different sense, however, from our earlier 
use of this word. A orthonormal set of functions is said to be complete if any 
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function f for which ||f||? is finite, and hence f an element of the Hilbert 
space, has a convergent expansion 


Fe) DS, . Gtiala): 


If we assume that such an expansion exists, and that we can freely interchange 
the order of the sum and integral, we can multiply both sides of this expansion 
by u* (x), integrate over x, and use the orthonormality of the u,’s to read 
off the expansion coefficients as a, = (Un, f). When 


Fl? = / f(a) de (2.24) 


and uy, = V2sin(n7z), the result is the half-range sine Fourier series. 


Example: Expanding unity. Suppose f(a) = 1. Since ie \f\?dz = 1 is 
finite, the function f(x) = 1 can be represented as a convergent sum of the 
Un = V2sin(nr2). 


The inner product of f with the u,’s is 


1 0, nm even, 
Chas TY = / V2sin(nrx) dx = yo eae 
Thus, 
oe ((2 45 ) in L?(0,1] (2.25) 
= S- Ont la sin( (2n mz), in map 


n=0 


It is important to understand that the sum converges to the left-hand side 
in the closed interval [0, 1] only in the L? sense. The series does not converge 
pointwise to unity at x =0 or x = 1 — every term is zero at these points. 
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nen eect ay 


0) : 2 0 ; 4 0 F 6 0 ‘ 8 
Figure 2.2: The sum of the first 31 terms in the sine expansion of f(x) = 1. 


Figure 2.2 shows the sum of the series up to and including the term with 
n = 30. The L?(0, 1] measure of the distance between f(x) = 1 and this sum 
S- a sin((2n + Ire) 
(2n+ 1)r 


1S 
1 
/ ! - 
0 n=0 


We can make this number as small as we desire by taking sufficiently many 
terms. 

It is perhaps surprising that a set of functions that vanish at the end- 
points of the interval can be used to expand a function that does not vanish 
at the ends. This exposes an important technical point: Any finite sum of 
continuous functions vanishing at the endpoints is also a continuous function 
vanishing at the endpoints. It is therefore tempting to talk about the “sub- 
space” of such functions. This set is indeed a vector space, and a subset of 
the Hilbert space, but it is not itself a Hilbert space. As the example shows, 
a Cauchy sequence of continuous functions vanishing at the endpoints of an 
interval can converge to a continuous function that does not vanish there. 
The “subspace” is therefore not complete in our original meaning of the term. 
The set of continuous functions vanishing at the endpoints fits into the whole 
Hilbert space much as the rational numbers fit into the real numbers: A fi- 
nite sum of rationals is a rational number, but an infinite sum of rationals 
is not in general a rational number and we can obtain any real number as 
the limit of a sequence of rational numbers. The rationals Q are therefore 
a dense subset of the reals, and, as explained earlier, the reals are obtained 
by completing the set of rationals by adding to this set its limit points. In 
the same sense, the set of continuous functions vanishing at the endpoints is 


30 2 


da = 0.00654. (2.26) 
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a dense subset of the whole Hilbert space and the whole Hilbert space is its 
completion. 


Exercise 2.2: In this technical exercise we will explain in more detail how 
we “complete” a Hilbert space. The idea is to mirror the construction to 
the real numbers and define the elements of the Hilbert space to be Cauchy 
sequences of continuous functions. To specify a general element of L?[a, bj 
we must therefore exhibit a Cauchy sequence f,, € Cla,b]. The choice is not 
unique: two Cauchy sequences fi (2) and f2 (x) will specify the the same 
element if 


dim [F2 - £P |] =0. 


Such sequences are said to be equivalent. For convenience, we will write 
“limp 00 fn = f” but bear in mind that, in this exercise, this means that 
the sequence f,, defines the symbol f, and not that f is the limit of the se- 
quence, as this limit need have no prior existence. We have deliberately written 
“f”, and not “f(x)”, for the “limit function” to warn us that f is assigned no 
unique numerical value at any x. A continuous function f(x) can still be con- 
sidered to be an element of L?/a,b]—take a sequence in which every f(z) is 
equal to f(x)—but an equivalent sequence of f,,(x) can alter the limiting f(x) 
on a set of measure zero without changing the resulting element f € L?/a, bj. 


i) If f, and g, are Cauchy sequences defining f, g, respectively, it is natural 
to try to define the inner product (f,g) by setting 


(f,9) = lim (fn, 9n)- 
N— CoO 
Use the Cauchy-Schwarz-Bunyakovsky inequality to show that the num- 
bers F, = (fn, Gn) form a Cauchy sequence in C. Since C is complete, 
deduce that this limit exists. Next show that the limit is unaltered if 
either fp, or gn is replaced by an equivalent sequence. Conclude that our 
tentative inner product is well defined. 

ii) The next, and harder, task is to show that the “completed” space is 
indeed complete. The problem is to show that given a Cauchy sequence 
fr € L? [a,b], where the f;, are not necessarily in C[a,b], has a limit 
in L?[a,b]. Begin by taking Cauchy sequences fp; € Cla, b] such that 
limj—oo fri = fr. Use the triangle inequality to show that we can select 
a subsequence fy, ;(,) that is Cauchy and so defines the desired limit. 


Later we will show that the elements of L?[a, b] can be given a concrete meaning 
as distributions. 
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Best approximation 


Let u,(x) be an orthonormal set of functions. The sum of the first N terms of 
the Fourier expansion of f(x) in the un, is the closest— measuring distance 
with the L? norm — that one can get to f whilst remaining in the space 
spanned by uy, U2,..., UN. 

To see this, consider the square of the error-distance: 


N 
A s I| f a S| Gitta |? = (f- oon eee Seu) 
1 “ i. 1 
SFP = Sea) =) eas es Am Gn (Um, Un) 
n=1 m=1 nym=1 
N N N 
= WF? — So an (f, ttn) ~~ S > 0%, (tims f) + S5 lanl’, (2.27) 
n=1 m=l1 n=1 


In the last line we have used the orthonormality of the u,. We can complete 
the squares, and rewrite A as 


A = ||f\? - Slew N+ Solon ( tin, FY? (2.28) 


We seek to minimize A by a suitable choice of coefficients a,. The smallest 
we can make it is 


N 
Armin = IF? — 32 M(t AYP, (2.29) 
n=1 


and we attain this bound by setting each of the |a, — (un, f)| equal to zero. 
That is, by taking 
Gn = i sek). (2.30) 


Thus the Fourier coefficients (un, f) are the optimal choice for the ay. 
Suppose we have some non-orthogonal collection of functions gn, n = 
1,....N, and we have found the best approximation penn AnGn(x) to f(x). 
Now suppose we are given a gy, to add to our collection. We may then seek 
an improved approximation ye a) gn(z) by including this new function — 
but finding this better fit will generally involve tweaking all the a,,, not just 
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trying different values of ay,,;. The great advantage of approximating by 
orthogonal functions is that, given another member of an orthonormal family, 
we can improve the precision of the fit by adjusting only the coefficient of the 
new term. We do not have to perturb the previously obtained coefficients. 


Parseval’s theorem 


The “best approximation” result from the previous section allows us to give 
an alternative definition of a “complete orthonormal set,” and to obtain the 
formula a, = (Un, f) for the expansion coefficients without having to assume 
that we can integrate the infinite series > a,u, term-by-term. Recall that 
we said that a set of points S is a dense subset of a space T if any given 
point x € T is the limit of a sequence of points in S, 7.e. there are elements 
of S lying arbitrarily close to x. For example, the set of rational numbers Q 
is a dense subset of R. Using this language, we say that a set of orthonormal 
functions {u,(x)} is complete if the set of all finite linear combinations of 
the uy, is a dense subset of the entire Hilbert space. This guarantees that, by 
taking N sufficently large, our best approximation will approach arbitrarily 
close to our target function f(z). Since the best approximation containing 
all the u, up to uy is the N-th partial sum of the Fourier series, this shows 
that the Fourier series actually converges to f. 

We have therefore proved that if we are given u,(z), n = 1,2,...,a 
complete orthonormal set of functions on |a,b], then any function for which 
|| f||? is finite can be expanded as a convergent Fourier series 


ea S- dntin(@), (2.31) 


where ; 
tn = (un f) = fui (o) f(a) ae. (2.32) 
The convergence is guaranteed only in the L? sense that 
b N ‘ 
im : f(x)- d, Gitta el Gi=0; (2.53) 
Equivalently 
N 
Ay =f — >. ental? > 0 (2.34) 


n=1 
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as N — co. Now, we showed in the previous section that 


N 
An = (IFIP — do lum AYP 
n=1 


N 
fl? = >_ lenl?, (2.35) 
n=1 
and so the L? convergence is equivalent to the statement that 
FIP SD5 leak (2.36) 
n=1 


This last result is called Parseval’s theorem. 
Example: In the expansion (2.25), we have || f?|| = 1 and 


22 
la, |? = hd T). meOdd, (2.37) 
0, nm even. 
Parseval therefore tells us tells us that 
= if I> 38 ‘a 
=l+—-4+—54+:°:-=—. 2.38 
2.Qntip PTR 8 eee) 


Example: The functions u,(x) = ome, n € Z form a complete orthonor- 


mal set on the interval [—7, 7]. Let f(x) = wae Then its Fourier expan- 
sion is 


i 4 = Tt os 
CX __ inx 
—e" = Cn ——e"",  —-T <a <7, 2.39 
V2 2S V 21 ( ) 
bi Lp in(m(¢ = n)) 
= 2. 1Cx —ing gf = SIN 7 =e 2. AO 
CET ae eS e*e x on) (2.40) 
We also have that 7 4 
FI? = / —dzr = 1. (2.41) 
ee 


Now Parseval tells us that 


wik= See (2.42) 


n=—Co 
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the left hand side being unity. 
Finally, as sin?((¢ — n)) = sin?(a¢), we have 


Ware) = 1 Jn 
cosec*(1¢) = xO) S- Ca nye (2.43) 


sin? ( ae 


The end result is a quite non-trivial expansion for the square of the cosecant. 


2.2.4 Orthogonal polynomials 


A useful class of orthonormal functions are the sets of orthogonal polynomials 
associated with an interval [a,b] and a positive weight function w(x) such 
that [ w(x) dx is finite. We introduce the Hilbert space L?,[a,b] with the 
real inner product 


(u,),= f w(x)u(x)v(x) dx, (2.44) 


and apply the Gram-Schmidt procedure to the monomial powers 1, 7, x7, x°,... 
so as to produce an orthonomal set. We begin with 


P(x) = 1/ll1 (2.45) 
where ||1||,, = f? w(2) dx, and define recursively 


Praile) = 7 (2.46) 
? I|tPa — Qo Pi(Pis @ Pa) lw 
Clearly P,,(x) is an n-th order polynomial, and by construction 
Pain) a = Onna (2.47) 


All such sets of polynomials obey a three-term recurrence relation 
LP, (x) = bp Pryi(Z) + GnPp(x) + byp_1Pr_1(2). (2.48) 


That there are only three terms, and that the coefficients of P,., and P,_1 
are related, is due to the identity 


(P,,tPyn),, = (®Pa, Pn) yy: (2.49) 


70 CHAPTER 2. FUNCTION SPACES 


This means that the matrix (in the P,, basis) representing the operation of 
multiplication by x is symmetric. Since multiplication by x takes us from 
P, only to Py+1, the matrix has just one non-zero entry above the main 
diagonal, and hence, by symmetry, only one below. 

The completeness of a family of polynomials orthogonal on a finite interval 
is guaranteed by the Weierstrass approximation theorem which asserts that 
for any continuous real function f(a) on |a, b], and for any ¢ > 0, there exists 
a polynomial p(x) such that | f(x) — p(x)| < ¢ for all x € [a,b]. This means 
that polynomials are dense in the space of continuous functions equipped 
with the ||... ||,. norm. Because | f(a) — p(x)| < € implies that 


/ | f(x) — p(x) [?w(x) dr < a w(x) dx, (2.50) 


they are also a dense subset of the continuous functions in the sense of L?,/a, 6] 
convergence. Because the Hilbert space L?,[a, b] is defined to be the comple- 
tion of the space of continuous functions, the continuous functions are auto- 
matically dense in L? [a,b]. Now the triangle inequality tells us that a dense 
subset of a dense set is dense in the larger set, so the polynomials are dense in 
L? [a, b] itself. The normalized orthogonal polynomials therefore constitute a 
complete orthonormal set. 

For later use, we here summarize the properties of the families of polyno- 
mials named after Legendre, Hermite and Tchebychef. 


Legendre polynomials 


Legendre polynomials have a = —1, b = 1 and w = 1. The standard Legendre 
polynomials are not normalized by the scalar product, but instead by setting 
P,,(1) = 1. They are given by Rodriguez’ formula 


1 d” 
Pie) Sr 1) 2.51 
(a) = (a? = 1) (2.51) 
The first few are 
Po(x) = 1, 
P\(x) = x, 
1 
Px(x) = 3(82*—1), 
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1 
P(t) = 5 (5a" — 32), 
1 
Pi(x) = 3 (35a" — 30x? + 3). 
Their inner product is 
: 2 
[Pay Pale) ae = pa (2.52) 


The three-term recurrence relation is 
(2n + 1)aP,(x) = (n + 1)Prii(z) + nPp_i(2). (2.53) 


The P,, form a complete set for expanding functions on [—1, 1]. 


Hermite polynomials 


The Hermite polynomials have a = —oo, b = +00 and w(x) = e~*”, and are 
defined by the generating function 
ete? > aes (Oe (2.54) 
2 atin 
If we write 
e2te-? = eee (2.55) 
we may use Taylor’s theorem to find 
d” 2 2 2 d” 2 
BiG ee) See 2.56 
@)= ere) = Cae Ee (2.56) 


which is a a useful alternative definition. The first few Hermite polynomials 
are 


A(x) 

Hi (@). = 2a, 

H(z) = 4x2*-2 

H3(x) = 82° — 12z, 
Hi(x) = 162% — 4827 + 12, 
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The normalization is such that 
/ H,,(2)Hm(a)e~® dx = 2"n!/ Toms (2.57) 


as may be proved by using the generating function. The three-term recur- 
rence relation is 
22H, (¢) = Ayia (2) + 2nHge (2). (2.58) 


Exercise 2.3: Evaluate the integral 


F(s,t) = [- Ba te ee ae 

—o0o 
and expand the result as a double power series in s and t. By examining the 
coefficient of s"t™, show that 


‘i H,,(v)Hm(x)e~™ dn = 2? nla / Opin: 


Problem 2.4: Let 
aje® /? 


be the normalized Hermite functions. They form a complete orthonormal set 


in L?(R). Show that 


= 1 Aryt — (x? 24 
2, tvnla)Pn{9) = Vay {eee ge? Wik, 


This is Mehler’s formula. (Hint: Expand of the right hand side as )7?° 9 dn(x, t)Yn(y)- 
To find a,(z,t), multiply by e2sy—8°-9"/2 and integrate over y.) 


Exercise 2.5: Let ~n(x) be the same functions as in the preceding problem. 
Define a Fourier-transform operator F : L?(IR) — L?(R) by 


Pi)= se fe Fo)as 


With this normalization of the Fourier transform, F“ is the identity map. The 
possible eigenvalues of F' are therefore +1, +i. Starting from (2.56), show that 
the y,,(x) are eigenfunctions of F’, and that 


F(Yn) = t"Pn(2). 
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Tchebychef polynomials 


Tchebychef polynomials are defined by taking a = —1, b = +1 and w(x) = 
(1 — «?)*"/?. The Tchebychef polynomials of the first kind are 


T,(x) = cos(ncos~' 2). (2.59) 
The first few are 
To(x) = 1, 
Tale), 222 ah 
T(z) = 22-1, 
T3(z) = 42° — 32. 


The Tchebychef polynomials of the second kind are 


sin(ncos-'xz) 1_, 
ac = . 2.60 
Un—s(2) sin(cos~! x) n n() ( ) 
and the first few are 
U_1(a) = 0, 
U(x) = 1; 
Uap = 2x; 
U2 (x) = Ag? = 1, 
Us(x) = 8a? — 4g. 


T,, and U,, obey the same recurrence relation 


200%, = Tr41 at Dad 
2x2U, = Oni +e Una 
which are disguised forms of elementary trigonometric identities. The orthog- 


onality is also a disguised form of the orthogonality of the functions cos n@ 
and sinné. After setting 7 = cos@ we have 


T 1 1 
| cosn6 cosmé dé = / LAC) Glade eon. cet 20, 
0 -1V1l-—<2? 
(2.61) 


74 CHAPTER 2. FUNCTION SPACES 
where hy = 7, hy = 71/2, n > 0, and 


T 1 
i sin nd sin md dd = i VI — 22Un-1(@)Um—1(x) dt = X5um, n,m > 0. 
0 -1 


2 
(2.62) 
The set {T,,(x)} is therefore orthogonal and complete in Li_a2)-1/2 [—1, 1], 
and the set {U,,(x)} is orthogonal and complete in La _ gyal], 1]. Any 
function continuous on the closed interval [—1, 1] lies in both of these spaces, 
and can therefore be expanded in terms of either set. 


2.3. Linear operators and distributions 


Our theme is the analogy between linear differential operators and matrices. 
It is therefore useful to understand how we can think of a differential operator 
as a continuously indexed “matrix.” 


2.3.1 Linear operators 
The action of a matrix on a vector y = Ax is given in components by 


The function-space analogue of this, g = Af, is naturally to be thought of as 


g(2) = / A(a, y) f(y) dy, (2.64) 


where the summation over adjacent indices has been replaced by an inte- 
gration over the dummy variable y. If A(z, y) is an ordinary function then 
A(z, y) is called an integral kernel. We will study such linear operators in 
the chapter on integral equations. 

The identity operation is 


f(x) = / ble =a) FU) de, (2.65) 


and so the Dirac delta function, which is not an ordinary function, plays the 
role of the identity matrix. Once we admit distributions such as d(x), we can 
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/ 
5 (x—a) 5(x-a) 


Figure 2.3: Smooth approximations to 6(x — a) and 0/(x — a). 


think of differential operators as continuously indexed matrices by using the 


distribution 


6 (2) = £ alay’ (2.66) 


The quotes are to warn us that we are not really taking the derivative of the 


highly singular delta function. The symbol 6’(z) is properly defined by its 
behaviour in an integral 


/ ‘e—niway = 


= - Fly) 8 — y) dy 
a y 
b 
= / f'(y)6(« — y) dy, (Integration by parts) 
= f(a). 


The manipulations here are purely formal, and serve only to motivate the 
defining property 


/ (a —y) Fly) ay = 7’). (2.67) 


It is, however, sometimes useful to think of a smooth approximation to 
6’(x — a) being the genuine derivative of a smooth approximation to 6(x—a), 
as illustrated in figure 2.3. 
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We can now define higher “derivatives” of 6(a) by 


/ 5) (2) f (a)de = (—1)" (0), (2.68) 


and use them to represent any linear differential operator as a formal integral 
kernel. 

Example: In chapter one we formally evaluated a functional second derivative 
and ended up with the distributional kernel (1.186), which we here write as 


-+ (00 5u = »)) + q(y)d(y — x) 


—p(y)o"(y — ) — p'(y)o'(y— x) + q(y)d(y— 2). (2.69) 


When k acts on a function u, it gives 
[re y)u(y)dy = / {—p(y)o"(y — x) — p'(y)h'(y — x) + a(y)d(y — x) } uly) dy 
[ow — ©) {—[p(y)u(y)}" + fp'(yu(y)]! + ay )uly) 
= / d(y — x) {—p(y)u"(y) — v'(y)u'(y) + aly)u(y)} dy 
= -£ (veo E) + aeputo) (2.70) 


k(x, y) 


I| 
+ 
Q 
SH 

Q 
Ned 


dx dx 


The continuous matrix (1.186) therefore does, as indicated in chapter one, 
represent the Sturm-Liouville operator L defined in (1.182). 


Exercise 2.6: Consider the distributional kernel 
k(a,y) = a2(y)o" (a — y) + a1(y)d" (a — y) + ao(y)d(a — y). 
Show that 
[kena dy = (a2(a)u(x))” + (a1(x)u(a))’ + ao(x)u(a). 
Similarly show that 
k(x, y) = a2(x)6" (a — y) + ai (x)6"(a — y) + ao(x)d(x — y), 
leads to 


/ k(, y)uu(y) dy = ap(ce)e"(e) + a (are (e) + ag(2e)u(c. 
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Exercise 2.7: The distributional kernel (2.69) was originally obtained as a 
functional second derivative 


oe ae y (4) 


dy(a1) \dy(x2) 
= = (v(02) fede - v1)) + q(x2)d(x2 — 21). 


By analogy with conventional partial derivatives, we would expect that 


fated ae) rs iste Gay 


but xz; and x2 appear asymmetrically in k(21, x2). Define 


k? (x1, 22) = k(x9,21), 


and show that 
[ie 2a)ulare) des = [ Fer, 22)ulxa) deo 


Conclude that, superficial appearance notwithstanding, we do have k(x1, x72) = 
k (x9, 21). 


The example and exercises show that linear differential operators correspond 
to continuously-infinite matrices having entries only infinitesimally close to 
their main diagonal. 


2.3.2 Distributions and test-functions 


It is possible to work most the problems in this book with no deeper under- 
standing of what a delta-function is than that presented in section 2.3.1. At 
some point however, the more careful reader will wonder about the logical 
structure of what we are doing, and will soon discover that too free a use 
of 6(a) and its derivatives can lead to paradoxes. How do such creatures fit 
into the function-space picture, and what sort of manipulations with them 
are valid? 

We often think of 6(a) as being a “limit” of a sequence of functions whose 
graphs are getting narrower and narrower while their height grows to keep 
the area under the curve fixed. An example would be the spike function 
6-(% — a) appearing in figure 2.4. 
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Ve 


a 


Figure 2.4: Approximation 6-(x — a) to d(a — a). 
The L? norm of 6,, 


1 
ae? =f [de(a)? ax = 2, (2.71) 


tends to infinity as ¢ — 0, so 6. cannot be tending to any function in L?. 
This delta function has infinite “length,” and so is not an element of our 
Hilbert space. 

The simple spike is not the only way to construct a delta function. In 
Fourier theory we meet 


A . 
oe / pike dk 7 so (2.72) 


A 27 nT x 


which becomes a delta-function when A becomes large. In this case 
OO ota 2 
2 sin“Ay , 
Ione = [ See-an (2:73) 


Again the “limit” has infinite length and cannot be accommodated in Hilbert 
space. This d(x) is even more pathological than 6-. It provides a salutary 
counter-example to the often asserted “fact” that 6(z) = 0 for z #0. As 
A becomes large 6,(0) diverges to infinity. At any fixed non-zero x, how- 
ever, 6,(x) oscillates between +1/x as A grows. Consequently the limit 
lim,_.oo 6,(x) exists nowhere. It therefore makes no sense to assign a numer- 
ical value to d(x) at any x. 

Given its wild behaviour, is not surprising that mathematicians looked 
askance at Dirac’s 6(x). It was only in 1944, long after its effectiveness in 
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solving physics and engineering problems had become an embarrassment, 
that Laurent Schwartz was able to tame d(x) by creating his theory of dis- 
tributions. Using the language of distributions we can state precisely the 
conditions under which a manoeuvre involving singular objects such as 6’(2) 
is legitimate. 

Schwartz’ theory is built on a concept from linear algebra. Recall that 
the dual space V* of a vector space V is the vector space of linear functions 
from the original vector space V to the field over which it is defined. We 
consider 6(x) to be an element of the dual space of a vector space T of test 
functions. When a test function y() is plugged in, the d-machine returns 
the number y(0). This operation is a linear map because the action of 6 on 
Ay(x)+pux(x) is to return Ay(0)+x(0). Test functions are smooth (infinitely 
differentiable) functions that tend rapidly to zero at infinity. Exactly what 
class of function we chose for TJ depends on the problem at hand. If we are 
going to make extensive use of Fourier transforms, for example, we mght 
select the Schwartz space, S(IR). This is the space of infinitely differentiable 
functions v(x) such that the seminorms® 


= \ (2.74) 


are finite for all positive integers m and n. The Schwartz space has the 
advantage that if y is in S(R), then so is its Fourier transform. Another 
popular space of test functions is D consisting of C'° functions of compact 
support—meaning that each function is identically zero outside some finite 
interval. Only if we want to prove theorems is a precise specification of T 
essential. For most physics calculations infinite differentiability and a rapid 
enough decrease at infinity for us to be able to ignore boundary terms is all 
that we need. 

The “nice” behaviour of the test functions compensates for the “nasty” 
behaviour of 6(z) and its relatives. The objects, such as 6(a), composing the 
dual space of T are called generalized functions, or distributions. Actually, 
not every linear map J — R is to be included in the dual space because, 
for technical reasons, we must require the maps to be continuous. In other 
words, if y, — y, we want our distributions u to obey u(yn) > u(y). Making 
precise what we mean by y, — y is part of the task of specifying J. In the 


3A seminorm |---| has all the properties of a norm except that |y| = 0 does not imply 
that y = 0. 
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Schwartz space, for example, we declare that y, > ¢ if |~n — Ylnm — 0, for 
all positive m,n. When we restrict a dual space to continuous functionals, 
we usually denote it by V’ rather than V*. The space of distributions is 
therefore JT’. 

When they wish to stress the dual-space aspect of distribution theory, 
mathematically-minded authors use the notation 


d(~) = ¥(0), (2.75) 
(5, ~) = 9(0), (2.76) 


in place of the common, but purely formal, 
|e dae = e(0)s (2.77) 


The expression (d,y) here represents the pairing of the element y of the 
vector space J with the element 6 of its dual space J’. It should not be 
thought of as an inner product as the distribution and the test function lie in 
different spaces. The “integral” in the common notation is purely symbolic, 
of course, but the common notation should not be despised even by those in 
quest of rigour. It suggests correct results, such as 


/ Gye Deane oe. (2.78) 


Ja| 


which would look quite unmotivated in the dual-space notation. 
The distribution 6’(x) is now defined by the pairing 


(5, ~) = -¢'(0), (2.79) 


where the minus sign comes from imagining an integration by parts that 
takes the “derivative” off 6(a) and puts it on to the smooth function y(z): 


: i 0'(x)p(x) da” = — / 6(x)y'(x) da. 280) 


Similarly 6 (a) is now defined by the pairing 


(5, y) = (-1)"o (0). (2.81) 
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The “nicer” the class of test function we take, the “nastier” the class 
of distributions we can handle. For example, the Hilbert space L? is its 
own dual: the Riesz-Fréchet theorem (see exercise 2.10) asserts that any 
continuous linear map F': L? — R can be written as F[f] = (1, f) for some 
1 € L?. The delta-function map is not continuous when considered as a 
map from L? — R however. An arbitrarily small change, f - f+6f, ina 
function (small in the L? sense of ||6f|| being small) can produce an arbitrarily 
large change in f(0). Thus L? functions are not “nice” enough for their 
dual space to be able accommodate the delta function. Another way of 
understanding this is to remember that we regard two L? functions as being 
the same whenever ||f; — f2|| = 0. This distance will be zero even if f; 
and f2 differ from one another on a countable set of points. As we have 
remarked earlier, this means that elements of L? are not really functions 
at all — they do not have an assigned valued at each point. They are, 
instead, only equivalence classes of functions. Since f(0) is undefined, any 
attempt to interpret the statement [ d(x) f(x) dx = f(0) for f an arbitrary 
element L? is necessarily doomed to failure. Continuous functions, however, 
do have well-defined values at every point. If we take the space of test 
of functions J to consist of all continuous functions, but not demand that 
they be differentiable, then 7’ will include the delta function, but not its 
“derivative” 6'(x), as this requires us to evaluate f’(0). If we require the test 
functions to be once-differentiable, then J’ will include 6’(x) but not 6”(z), 
and so on. 


When we add suitable spaces J and T’ to our toolkit, we are constructing 
what is called a rigged* Hilbert space. In such a rigged space we have the 
inclusion 


Se coed ite ol cnn (2.82) 


The idea is to take the space JT’ big enough to contain objects such as the 
limit of our sequence of “approximate” delta functions 6., which does not 
converge to anything in L?. 

Ordinary functions can also be regarded as distributions, and this helps 
illuminate the different senses in which a sequence u, can converge. For 
example, we can consider the functions 


t,Ssnny,.. OS e< 1, (2.83) 


4“Rieged” as in a sailing ship ready for sea, not “rigged” as in a corrupt election. 
&& g y && 
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as being either elements of L*/0,1] or as distributions. As distributions we 
evaluate them on a smooth function y as 


(Un; Y) -| p(x)Un(x) dx. (2.84) 


Now 
him (tig. ) =U, (2.85) 


since the high-frequency Fourier coefficients of any smooth function tend 
to zero. We deduce that as a distribution we have lim, _...Un, = 0, the 
convergence being pointwise on the space of test functions. Considered as 
elements of L?{0,1], however, the u, do not tend to zero. Their norm is 
||wn|| = 1/2 and so all the u, remain at the same fixed distance from 0. 


Exercise 2.8: Here we show that the elements of L?/a,b], which we defined 
in exercise 2.2 to be the formal limits of of Cauchy sequences of continuous 
functions, may be thought of as distributions. 

i) Let y(x) be a test function and f,(x) a Cauchy sequence of continuous 
functions defining f € L?. Use the Cauchy-Schwarz-Bunyakovsky in- 
equality to show that the sequence of numbers (y, fn) is Cauchy and so 
deduce that limy oo (y, fn) exists. 

ii) Let v(x) be a test function and f(a) and f (x) be a pair of equiva- 
lent sequences defining the same element f € L?. Use Cauchy-Schwarz- 
Bunyakovsky to show that 


Tim (y, fA — £2) = 0. 


Combine this result with that of the preceding exercise to deduce that 
we can set 


(9, f) = lim (9°, fn), 
and so define f = limn_.oo fn as a distribution. 


The interpretation of elements of L? as distributions is simultaneously simpler 
and more physical than the classical interpretation via the Lebesgue integral. 
Weak derivatives 


By exploiting the infinite differentiability of our test functions, we were able 
to make mathematical sense of the “derivative” of the highly singular delta 
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function. The same idea of a formal integration by parts can be used to 
define the “derivative” for any distribution, and also for ordinary functions 
that would not usually be regarded as being differentiable. 

We therefore define the weak or distributional derivative u(x) of a distri- 
bution u(x) by requiring its evaluation on a test function y € T to be 


[re dx & - fume dx. (2.86) 
In the more formal pairing notation we write 


—(u, y’). (2.87) 


The right hand side of (2.87) is a continuous linear function of y, and so, 
therefore, is the left hand side. Thus the weak derivative u’ = v is a well- 
defined distribution for any u. 

When u(x) is an ordinary function that is differentiable in the conven- 
tional sense, its weak derivative coincides with the usual derivative. When 
the function is not conventionally differentiable the weak derivative still ex- 
ists, but does not assign a numerical value to the derivative at each point. It 
is therefore a distribution and not a function. 

The elements of L? are not quite functions — having no well-defined 
value at a point — but are particularly mild-mannered distributions, and 
their weak derivatives may themselves be elements of L?. It is in this weak 
sense that we will, in later chapters, allow differential operators to act on L? 
“functions.” 

Example: In the weak sense 


(v, y) 


a = sgn(x), (2.88) 
+ sen(c) =" 20( i) (2.89) 


The object |z| is an ordinary function, but sgn(x) has no definite value at 
x = 0, whilst 6(a) has no definite value at any zx. 

Example: As a more subtle illustration, consider the weak derivative of the 
function In |x|. With v(x) a test function, the improper integral 


r=-f y (x) In |2| dx = — lim (/ +f ) ¥@ In|a|dx (2.90) 
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is convergent and defines the pairing (— In |z|,y’). We wish to integrate by 
parts and interpret the result as ({In |z|]’,). The logarithm is differentiable 
in the conventional sense away from x = 0, and 


In|x|p(2)! =—ol2) +In|olg(2), 2 £0. (2.91) 


From this we find that 


Stndehigy = = tim, {(f + +f ) Sot nde + (¢ (')infe|~ e(-e) me) } 


(2.92) 
So far e and é’ are unrelated except in that they are both being sent to zero. 
If, however, we choose to make them equal, ¢ = e’, then the integrated-out 
part becomes 


(v(e) - o(-2)) Inlel ~ 2p"(0)eIn el, (2.93) 


and this tends to zero as € becomes small. In this case 


~((in fa), )= tim { (f + [°) 200 Jac. (2.94) 


By the definition of the weak derivative, the left hand side of (2.94) is the 
pairing ((In |||‘, ¢). We conclude that 


d 1 
—— =P{—- : 
dix: ae (=) ; a9) 


where P(1/z), the principal-part distribution, is defined by the right-hand- 
side of (2.94). It is evaluated on the test function p(x) by forming [ y(x)/x dz, 
but with an infinitesimal interval from —e to +e, omitted from the range 
of integration. It is essential that this omitted interval lie symmetrically 
about the dangerous point « = 0. Otherwise the integrated-out part will 
not vanish in the « — O limit. The resulting principal-part integral, written 
Pf y(x)/x dz, is then convergent and P(1/x) is a well-defined distribution 
despite the singularity in the integrand. Principal-part integrals are common 
in physics. We will next meet them when we study Green functions. 

For further reading on distributions and their applications we recommend 
M. J. Lighthill Fourier Analysis and Generalised Functions, or F. G. Fried- 
lander Introduction to the Theory of Distributions. Both books are published 
by Cambridge University Press. 
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2.4 Further exercises and problems 


The first two exercises lead the reader through a proof of the Riesz-Fréchet 
theorem. Although not an essential part of our story, they demonstrate how 
“completeness” is used in Hilbert space theory, and provide some practice 
with “e,6” arguments for those who desire it. 


Exercise 2.9: Show that if a norm || || is derived from an inner product, then 
it obeys the parallelogram law 


lf + oll? + IF — oll? = 201 FI? + llgll?). 


Let N be a complete linear subspace of a Hilbert space H. Let g ¢ N, and let 
inf ||g — f|| =d. 
int lig — fl 


Show that there exists a sequence f, € N such that limp. || fn — g|| = d. 
Use the parallelogram law to show that the sequence f,, is Cauchy, and hence 
deduce that there is a unique f € N such that ||g — f|| = d. From this, 
conclude that d > 0. Now show that ((g — f),h) =0 for allhe N. 


Exercise 2.10: Riesz-Fréchet theorem. Let L[{h| be a continuous linear func- 
tional on a Hilbert space H. Here continuous means that 


|hn — hl| ~ 0 => Lih,] — LA]. 
Show that the set N = {f € H : L{f| = 0} is a complete linear subspace of H. 
Suppose now that there is a g € H such that L(g) 4 0, and let 1 € H be the 
vector “g — f” from the previous problem. Show that 
L{h] = (al,h), where a = L{g]/(1, 9) = L[g]/l|t\I?. 


A continuous linear functional can therefore be expressed as an inner product. 


Next we have some problems on orthogonal polynomials and three-term re- 
currence relations. They provide an excuse for reviewing linear algebra, and 
also serve to introduce the theory behind some practical numerical methods. 


Exercise 2.11: Let {P,(x)} be a family of polynomials orthonormal on [a, }] 
with respect to a a positive weight function w(x), and with deg |P,(x)] =n. 
Let us also scale w(x) so that i w(x) dx = 1, and Po(x) = 1. 
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a) Suppose that the P,,(a) obey the three-term recurrence relation 
GP fe) = 6,248) bogie) Pawa=0 Aw)=L 
Define 
Pn(®) = Pr(&)(bn—1bn—2 ae bo), 
and show that 


Lpy(x) = Pn+1(2) a AnPn(x) aE b2 4 Pn_1(2); p_-1(x) = 0, po(x) =1. 


Conclude that the p,(x) are monic — i.e. the coefficient of their leading 
power of x is unity. 
b) Show also that the functions 


b 
Pn(®) = Pnl& 
a 5 g 

are degree n—1 monic polynomials that obey the same recurrence relation 

as the p,(x), but with initial conditions go(x) = 0, qi(x) = few ae =-1. 
Warning: while the qg,(x) polynomials defined in part b) turn out to be very 
useful, they are not mutually orthogonal with respect to ( , ),,- 
Exercise 2.12: Gaussian quadrature. Orthogonal polynomials have application 
to numerical integration. Let the polynomials { P,,(a)} be orthonormal on [a, ] 
with respect to the positive weight function w(), and let z,, v=1,...,N be 
the zeros of Py(x). You will show that if we define the weights 


= * Pwo) w(x) dz 
w= | Raves 


then the approximate integration scheme 


b 
il f(x)w() dz = wi f(a1) + wef (v2) +--+ wwf (en), 


known as Gauss’ quadrature rule, is exact for f(x) any polynomial of degree 
less than or equal to 2N — 1. 
a) Let a(x) = (a — €;)(a — &2)--- (a — En) be a polynomial of degree N. 
Given a function F(x), show that 


Fil) “So Fe) 
2 TENE = &) 


is a polynomial of degree N — 1 that coincides with F(x) at x = &), 
vy=1,...,N. (This is Lagrange’s interpolation formula.) 
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b) Show that if F(x) is polynomial of degree N — 1 or less then F(x) = 

c) Let f(a) be a polynomial of degree 2N — 1 or less. Cite the polynomial 
division algorithm to show that there exist polynomials Q(x) and R(x), 
each of degree N — 1 or less, such that 


f(x) = Py(2)Q(x) + R(z). 
d) Show that f(x,) = R(x,_), and that 


i; Hie [ neues 


e) Combine parts a), b) and d) to establish Gauss’ result. 

f) Show that if we normalize w(x) so that f wdz = 1 then the weights w, 
can be expressed as wy = qn(v)/py(av), where pn(x), dn(x) are the 
monic polynomials defined in the preceding problem. 


The ultimate large-N exactness of Gaussian quadrature can be expressed as 


w(x) = Jim, > d(a — nvm} é 


Of course, a sum of Dirac delta-functions can never become a continuous 
function in any ordinary sense. The equality holds only after both sides are 
integrated against a smooth test function, 7.e., when it is considered as a 
statement about distributions. 


Exercise 2.13: The completeness of a set of polynomials {P,,(x)}, orthonor- 
mal with respect to the positive weight function w(x), is equivalent to the 
statement that 


nay 


It is useful to have a formula for the partial sums of this infinite series. 


S © Pn(2)Pn(y) = 
n=0 


Suppose that the polynomials P,,(x) obey the three-term recurrence relation 
OP, (£) = by Pad (a) + Gp P(e) + Opa Prats); Pale) =0; Po(a)= 1. 


Use this recurrence relation, together with its initial conditions, to obtain the 
Christoffel-Darboux formula 


by [Py (0) Px—a(y) = Py-a(x)Pw(y)) 
2=y 


N-1 
Ss" Pr(z)Paly) = 
n=0 
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Exercise 2.14: Again suppose that the polynomials P,,(x) obey the three-term 
recurrence relation 


CP): bp aa aad Ae) ee a) 0, Poe. 


Consider the N-by-N tridiagonal matrix eigenvalue problem 


aN-1 bn_2 0 0 en 0 UN-1 UN—1 
by-2 @n-2 bn-3 0 0 UN-2 UN-2 
0 bn-3 an—-3 bn-4 0 UN-3 UN-3 
; cr =e Pe 2. - : =x 
0 tae bo ag by 0) | | U2 | U2 | 
0 ak 0 bi ay bo U1 U1 
0 Hoh 0 0 bo | Uo | UO | 
a) Show that the eigenvalues x are given by the zeros x), v = 1,...,N 


of Py(x), and that the corresponding eigenvectors have components 
C= Pts yt a Opes I Me 

b) Take the x — y limit of the Christoffel-Darboux formula from the preced- 
ing problem, and use it to show that the orthogonality and completeness 
relations for the eigenvectors can be written as 


N-1 

Ss" Fk tis) Pe (te) = lige Oats 

n=0 
N 
SOP a) Paley) Soma, ee I I, 
v=1 


where w, 1 = by_-1Py(xv)Pn-1(2v). 

c) Use the original Christoffel-Darboux formula to show that, when the 
P,,(x) are orthonormal with respect to the positive weight function w(x), 
the normalization constants w, of this present problem coincide with the 
weights w, occurring in the Gauss quadrature rule. Conclude from this 
equality that the Gauss-quadrature weights are positive. 


Exercise 2.15: Write the N-by-N tridiagonal matrix eigenvalue problem from 
the preceding exercise as Hu = xu, and set dy(x) = det (xI— H). Similarly 
define d,, (x) to be the determinant of the n-by-n tridiagonal submatrix with «— 
Gn—1,..-,£—4o along its principal diagonal. Laplace-develop the determinant 
d,(x) about its first row, and hence obtain the recurrence 


dn41(&) = (@ — an) dn (x) — by, -1dn—1(2). 


2.4. FURTHER EXERCISES AND PROBLEMS 89 


Conclude that 


det (wI — H) = py(), 


where p,,(x) is the monic orthogonal polynomial obeying 


@Dy(L) = Pre) +Gnde@) +02 apna); pA) =0;-po(@) = 1. 


Exercise 2.16: Again write the N-by-N tridiagonal matrix eigenvalue problem 
from the preceding exercises as Hu = xu. 


a) 


Show that the lowest and rightmost matrix element 
(0|(I — H)~|0) = (2I — H)oy 


of the resolvent matrix (cI — H)~! is given by a continued fraction 
Gy_-1,0(x) where, for example, 


G3. (x) = 
zw — ao- 


Lo A] = 


Use induction on n to show that 


In(x)z na Qn+1(£) 


Gn,z(Z) = Pr(x)z + Pn4i(z) 


where pp(x), n(x) are the monic polynomial functions of x defined by 
the recurrence relations 


LPn(L) = Pn+1(2) aie AnPn(x) + b2_1Pn-1(2), PAs = 0, po(2) = 1, 
tGr(Z) = dn41(©) + AnGn(x) + 02_ydn-1(),  Go(@) = 0, g(x) = 1. 
Conclude that 
—liny _ qn (x) 
(o\(e - H)*I0) = 2, 


has a pole singularity when x approaches an eigenvalue x,. Show that 
the residue of the pole (the coefficient of 1/(2 — x,)) is equal to the 
Gauss-quadrature weight w, for w(x), the weight function (normalized 
so that f w dx = 1) from which the coefficients a,,, 6, were derived. 
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Continued fractions were introduced by John Wallis in his Arithmetica 
Infinitorum (1656), as was the recursion formula for their evaluation. Today, 
when combined with the output of the next exercise, they provide the math- 
ematical underpinning of the Haydock recursion method in the band theory 
of solids. Haydock’s method computes w(x) = limy. {>°, d(x — ,)w}, 
and interprets it as the local density of states that is measured in scanning 
tunnelling microscopy. 


Exercise 2.17: The Lanczos tridiagonalization algorithm. Let V be an N- 
dimensional complex vector space equipped with an inner product ( , ) and 
let H: V — V bea hermitian linear operator. Starting from a unit vector uo, 
and taking u_; = 0, recursively generate the unit vectors u, and the numbers 
Gn, bn and Cp, by 


Huy = bp Un41 + GnUn + Cn—1Un-1, 


where the coefficients 


Qn (un, Hun), 


(inst; Hu,), 


Cn-1 
ensure that u,+1 is perpendicular to both u, and uy_1, and 
bn = || Zu, — An Un — Cy 1 Uy Al 
a positive real number, makes ||/u,+1|| = 1. 


a) Use induction on n to show that u,+1, although only constructed to be 
perpendicular to the previous two vectors, is in fact (and in the absence 
of numerical rounding errors) perpendicular to all u,, with m <n. 

b) Show that an, Cp are real, and that cp_1 = bp-1. 

c) Conclude that by—1; = 0, and (provided that no earlier b,, happens to 
vanish) that the u,, n = 0,...,.N — 1, constitute an orthonormal basis 
for V, in terms of which H is represented by the N-by-N real-symmetric 
tridiagonal matrix H of the preceding exercises. 


Because the eigenvalues of a tridiagonal matrix are given by the numerically 
easy-to-find zeros of the associated monic polynomial py (x), the Lanczos al- 
gorithm provides a computationally efficient way of extracting the eigenvalues 
from a large sparse matrix. In theory, the entries in the tridiagonal H can be 
computed while retaining only u,, U,_; and Hu, in memory at any one time. 
In practice, with finite precision computer arithmetic, orthogonality with the 
earlier u,;, is eventually lost, and spurious or duplicated eigenvalues appear. 
There exist, however, stratagems for identifying and eliminating these fake 
eigenvalues. 
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The following two problems are “toy” versions of the Lax pair and tau func- 
tion constructions that arise in the general theory of soliton equations. They 
provide useful practice in manipulating matrices and determinants. 


Problem 2.18: The monic orthogonal polynomials p;(xz) have inner products 


(DisPidag = f vile)py(e)w(e) de = hid, 
and obey the recursion relation 
xp;(x) = pigi(x) + agp;(x) + b7_1p;-1(@); p_1(x) =0, po(z) = 1. 


Write the recursion relation as 


Lp = zp, 
where 
L= 1 ag bt 0 , p= p2 
1 at be P1 
0 1 ag Po 


Suppose that 
[oe 
w(x) = exp {- S- srt ; 
n=1 


and consider how the p;(x) and the coefficients a; and b? vary with the pa- 
rameters tp. 


a) Show that 


where M ”) is some strictly upper triangular matrix - i.e. all entries on 
and below its principal diagonal are zero. 
b) By differentiating Lp = xp with respect to tn show that 
OL 


eS = TNO, 
soo IMU 


c) Compute the matrix elements 


agin) aces Ay cs fs OP 
(i|M( j) = My = (pi et) 
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(note the interchange of the order of i and j in the ( , ),, bee by 
differentiating the orthogonality condition (p;,p;),,, = hidij. Hence show 
that 
M) = (L"), 
where (L”), denotes the strictly upper triangular projection of the n’th 
power of L — i.e. the matrix L”, but with its diagonal and lower trian- 
gular entries replaced by zero. 
Thus 9 
L n 
aS [(L"),,L] 


describes a family of deformations of the semi-infinite matrix L that, in some 
formal sense, preserve its eigenvalues x. 


Problem 2.19: Let the monic polynomials p,,(x) be orthogonal with respect 
to the weight function 


w(x )= esr} = Set. 


Define the “tau-function” 7,,(t1, ta, t3...) of the parameters t; to be the n-fold 


integral 
itldbons ce )= ff- f desdes.. .dry,A “oder |= 3 se 


vole 


where 


a gr? x, 1 
a gh? x2 
A(a) — : : — [[@ — Li) 
: : v<p 
ame ai Ln 1 
is the n-by-n Vandermonde determinant. 
a) Show that 
a at aurea Sal Pn-1(@1)  Pn-2(@1) --- pi(%1) —po(x1) 
fee “Oy. ae ep. Pn—-1(Z2) Pn—2(%2) --. pil%2) po(x2) 


Ct aid Oy Als Saas) Daan): aes i Dia) 
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b) Combine the identity from part a) with the orthogonality property of the 
Prn(x) to show that 


PL) = — f deydry...deyd*(z) [[@- 2) v0 Shae ! 


p=1 v=l1m=1 
Seb icon auaats) 
HiT, bas 033,004) 


where 


Here are some exercises on distributions: 


Exercise 2.20: Let f(x) be a continuous function. Observe that f(x)d(x) = 
f(0)6(a). Deduce that 


—[f(x)5(a)] = f(0)d"(2). 


If f(x) were differentiable we might also have used the product rule to conclude 
that 


+ [fla)6(0)] = F(a) (2) + F(2)6!(2. 


Show, by evaluating f(0)6’(a) and f’(x)d(x) + f(a)6’(a) on a test function 
(x), that these two expressions for the derivative of f(x)d(x) are equivalent. 


Exercise 2.21: Let p(x) be a test function. Show that 


rata reer. spar [er ene me 


Show further that the right-hand-side of this equation is equal to 


-(ée(ch)»)=" fee 


Exercise 2.22: Let 0(x) be the step function or Heaviside distribution 


1, xr>0 
Oey = undefined, x =0, 
0, x<0. 
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By forming the weak derivative of both sides of the equation 


lim In(x + te) = In|z| + in6(—2), 


E04 


lim ‘ : : ) -P(=) — ind(x). 
604 \uU+1E x 


Exercise 2.23: Use induction on n to generalize exercise 2.21 and show that 


a Pf = ix} 


conclude that 


| 
me) 
Co 
Q 8 
S 
| 
ot] & 
3 
te 
SS. — 
s 
& 
| 
3 
M: 
S| 
Ss 
| 
Rae, 
3 
Ame 
= 
> 
Q 
8 


Exercise 2.24: Let the non-local functional S[f] be defined by 


S[f]= zl. i {FOS doa 


Compute the functional derivative of S|f] and verify that it is given by 


5S 1d {p F(x") wih 


/ 
_~ 2-2 


6f(x) mdz 


See exercise 6.10 for an occurence of this functional. 


Chapter 3 


Linear Ordinary Differential 
Equations 


In this chapter we will discuss linear ordinary differential equations. We will 
not describe tricks for solving any particular equation, but instead focus on 
those aspects the general theory that we will need later. 

We will consider either homogeneous equations, Ly = 0 with 


Ly = po(x)y + pi(a)y + ++ + pr(a)y, (3.1) 
or inhomogeneous equations Ly = f. In full, 
polx)y™ + pi(x)y*) +--+ + pr(x)y = f(2). (3.2) 


We will begin with homogeneous equations. 


3.1 Existence and uniqueness of solutions 


The fundamental result in the theory of differential equations is the existence 
and uniqueness theorem for systems of first order equations. 


3.1.1 Flows for first-order equations 


Let z,...,2", be a system of coordinates in R”, and let X*(z!, 2?,..., 2”, t), 
i= 1,...,n, be the components of a t-dependent vector field. Consider the 
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system of first-order differential equations 


d 1 

— — Pal gtr ee a 

d 2 

—_ = Des rier 22°50), 

d n 

= SS OE a ge) (3.3) 
For a sufficiently smooth vector field (X1, X?,..., X”) there is a unique solu- 


tion x*(t) for any initial condition x'(0) = x}. Rigorous proofs of this claim, 
including a statement of exactly what “sufficiently smooth” means, can be 
found in any standard book on differential equations. Here, we will simply 
assume the result. It is of course “physically” plausible. Regard the X‘ as 
being the components of the velocity field in a fluid flow, and the solution 
x'(t) as the trajectory of a particle carried by the flow. An particle initially at 
x'(0) = 2 certainly goes somewhere, and unless something seriously patho- 
logical is happening, that “somewhere” will be unique. 
Now introduce a single function y(t), and set 


1 


= Y; 
7 = 4 
e = jj, 
er = yD, (3.4) 
and, given smooth functions po(t),...,Pn(t) with po(t) nowhere vanishing, 
look at the particular system of equations 
dx! 
oes — x, 
dt 
d 2 
ao os 
dt 
dx”! n 
— aE 5 
dt 
dx” 
= (piz” + poz”) +--+ + py’). (3.5) 


dt po 
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This system is equivalent to the single equation 


n nm—1 
polt) 2 + p(t +--+ pil +pa(ult)=0. (36) 
Thus an n-th order ordinary differential equation (ODE) can be written as a 
first-order equation in n dimensions, and we can exploit the uniqueness result 
cited above. We conclude, provided jo never vanishes, that the differential 
equation Ly = 0 has a unique solution, y(t), for each set of initial data 
(y(0), 9(0), H(0),..-,y-9(0)). Thus, 
i) If Ly = 0 and y(0) = 0, y(0) = 0, ¥(0) = 0, ..., y-Y(0) = 0, we 
deduce that y = 0. 
ii) If y:(t) and y(t) obey the same equation Ly = 0, and have the same 
initial data, then y;(t) = y2(t). 


3.1.2 Linear independence 


In this section we will assume that po does not vanish in the region of x we are 
interested in, and that all the p; remain finite and differentiable sufficiently 
many times for our formulae to make sense. 

Consider an n-th order linear differential equation 


po(x)y™ + pi(a)y) +--+ pa(x)y = 0. (3.7) 


The set of solutions of this equation constitutes a vector space because if y; (x) 
and y2(z) are solutions, then so is any linear combination Ay; (x) + [ye(z). 
We will show that the dimension of this vector space is n. To see that this 
is so, let y:(a) be a solution with initial data 


yi (0) = ds 

yi(0) = 9, 

yr) = 0, (3.8) 
let yo(x) be a solution with 

y2(0) = 0; 

y,(0) = 1, 
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and so on, up to y,(x), which has 


yn(0) = 0, 
Yn(0) = 0, 
yrD = 1, (3.10) 
We claim that the functions y;(x) are linearly independent. Suppose, to the 
contrary, that there are constants A,,...,A, such that 
0 = Aryi (x) + Agya(Z) +--+ + AnYn(Z). (3.11) 
Then 
0 = Aryi(0) + Azy2(0) +--+ AnYn(O) = AL =0. (3.12) 
Differentiating once and setting x = 0 gives 
0 = Azy,(0) + Azy5(0) + +++ +Anyl,(0) => A2=0. (3.13) 


We continue in this manner all the way to 
0 = Ary!"-Y 0) + Avy" P (0) + + ANY P(0) SAW =O. (3.14) 


All the A; are zero! There is therefore no non-trivial linear relation between 
the y;(x), and they are indeed linearly independent. 

The solutions y;(x) also span the solution space, because the unique solu- 
tion with intial data y(0) = a1, y’(0) = ag, ..., y"~) (0) = ap can be written 
in terms of them as 


y(x) = ayy (x) + agyo(£) + +++ GnYn(2). (3.15) 


Our chosen set of n solutions is therefore a basis for the solution space of the 
differential equation. The dimension of the solution space is therefore n, as 
claimed. 


3.1.3. The Wronskian 


If we manage to find a different set of n solutions, how will we know whether 
they are also linearly independent? The essential tool is the Wronskian: 


Y1 Y2 oo Un 

/ / / 

def Y Yo tee Un 
Wyi,---,¥nit) =]. ; . (3.16) 


(n—1) yor) -_ yo) 
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Recall that the derivative of a determinant 


Qi1 12 Qin 
21 a22 Sat Q2n 

5) — 1 rene (3.17) 
Qn1 On2 see Ann 


may be evaluated by differentiating row-by-row: 


! ! ! 
Q4,  Ayg «++ An a4, G@j2 ..-- Ain aj. aja... Ain 
/ ! ! 
dD 21 Go2... Gan Gini > Sag? sats Ose: da, 92s. Ay, 
=o a “a i feepp 
dx é 
! ! ! 
Qni Ang +... Ann Qni Qn2 .--- Ann Qni Ang +--+ GG 


Applying this to the derivative of the Wronskian, we find 


Y1 Yo «ee Un 
dw / Ooh pact ia 
GUE alae ~e2 Z (3.18) 
dx : : ie : 

ian rr 


Only the term where the very last row is being differentiated survives. All 
the other row derivatives gives zero because they lead to a determinant with 
two identical rows. Now, if the y; are all solutions of 


poy + pry) +--+ + pay = 0, (3.19) 
we can substitute 
n 1 n— n— 
y\ ne = (pv! . + poy D4... + Path) ; (3.20) 


use the row-by-row linearity of determinants, 


Ady + plbyy  AGig + big. Adin + din 
C21 C22 au Con 
Cyl Cn2 Cnn 
Qi, 412. «++ Ain bit big ww. Dn 
Cc Cc Se COR Cc C inci Cop, 
HAPS OP te OS 2, B20) 
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and find, again because most terms have two identical rows, that only the 
terms with p; survive. The end result is 


dW P1 
See fe : 22 
dx (2) is ce 


Solving this first order equation gives 


VG Wises \- / Ga igh. (3.23) 


Since the exponential function itself never vanishes, W(x) either vanishes at 
all x, or never. This is Liouville’s theorem, and (3.23) is called Liouville’s 
formula. 

Now suppose that y1,..., Yn are a set of C” functions of x, not necessarily 
solutions of an ODE. Suppose further that there are constants \;, not all zero, 
such that 


Aryi(L) + Azyo(T) + +++ + AnYn(x) = 9, (3.24) 
(i.e. the functions are linearly dependent) then the set of equations 
Aryi(L) + Avye(z) + +++ +AnYn(x) = 0, 
Aryy(Z) + A2ya(z) +++-+Anyr(z) = 9, 

Ary") (@) + Asyy"P(@) +--+ ANY V(@) = 0, (3.25) 
has a non-trivial solution 1, A2,...,A,, and so the determinant of the coef- 
ficients, 

Y1 Y2 . Yn 
on Ye ase “0p 
W = . ; ‘ . , (3.26) 
n-1 n-1 n-1 
yl! ) { ) (n—1) 
must vanish. Thus 
linear dependence => W = 0. 
There is a partial converse of this result: Suppose that y1,..., Yn are solutions 


to an n-th order ODE and W(y;; x) = 0 at x = xo. Then there must exist a 
set of A;, not all zero, such that 


Y (x) = Aryi(@) + Aayo(x) + +++ + AnYn(x) (3.27) 
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has 0 = Y(xo) = Y"(ap) = --- = Y"")(xo). This is because the system of 
linear equations determining the A; has the Wronskian as its determinant. 
Now the function Y(x) is a solution of the ODE and has vanishing initial 
data. It is therefore identically zero. We conclude that 


ODE and W = 0 = linear dependence. 


If there is no ODE, the Wronskian may vanish without the functions 
being linearly dependent. As an example, consider 


= 0, x <0, 
w(t) = exp{—1/2?}, x>0. 


ae x <0, (3.28) 


yal) 0 ny 


We have W (y1, y2; x) = 0, but y1, y2 are not proportional to one another, and 
so not linearly dependent. (Notethat y;,2 are smooth functions. In particular 
they have derivatives of all orders at x = 0.) 

Given n linearly independent smooth functions y;, can we always find an 
n-th order differential equation that has them as its solutions? The answer 
had better be “no”, or there would be a contradiction between the preceeding 
theorem and the counterexample to its extension. If the functions do satisfy 
a common equation, however, we can use a Wronskian to construct it: Let 


Ly = po(x)y™ + pr(x)y? + +--+ pa(x)y (3.29) 


be the differential polynomial in y() that results from expanding 


ae es 
Di , ee (3.30) 
Ge EN” Sooke i, 


Whenever y coincides with any of the y;, the determinant will have two 
identical rows, and so Ly = 0. The y; are indeed n solutions of Ly = 0. As 
we have noted, this construction cannot always work. To see what can go 
wrong, observe that it gives 


Wh tke Wi 
ia eam ae 

pon) =|") wise. (3.31) 
yor?) yr?) fe. ais 
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If this Wronskian is zero, then our construction fails to deliver an n-th order 
equation. Indeed, taking y; and yz to be the functions in the example above 
yields an equation in which all three coeffecients po, pi, p2 are identically 
Zero. 


3.2 Normal form 
In elementary algebra a polynomial equation 
apt” +ayz™ 1 4+---an =0, (3.32) 


with ag # 0, is said to be in normal form if a, = 0. We can always put such an 
equation in normal form by defining a new variable £ with x = %—a,(nag)~'. 

By analogy, an n-th order linear ODE with no y~ term is also said to 
be in normal form. We can put an ODE in normal form by the substitution 
y = wy, for a suitable function w(x). Let 


poy™ + pry?) +--+ + pry = 0. (3.33) 
Set y = wy. Using Leibniz’ rule, we expand out 


=| 
MO Da ytgo9 4+ WG. (8.34) 


(w9)™ = w9™ + nw’G"") + 
The differential equation becomes, therefore, 
(wpo)g + (prw + ponw')g?) +++. = 0. (3.35) 


We see that if we chose w to be a solution of 


piw + ponw’ = 0, (3.36) 


wer-enf 38) }. 


then y obeys the equation 


for example 


(wpo)g” + poy) 4... =0, (3.38) 


with no second-highest derivative. 
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Example: For a second order equation, 
y" + pry! + poy = 0, (3.39) 


we set y(x) = v(x) exp{—$ J} pi(€)dE} and find that v obeys 


v” + Qu = 0, (3.40) 
where 
1 ! if 2 
Q. = po — 5P1 = qPu (3.41) 


Reducing an equation to normal form gives us the best chance of solving 
it by inspection. For physicists, another advantage is that a second-order 
equation in normal form can be thought of as a Schrodinger equation, 


——> + (V(x) — E)y =0, (3.42) 


and we can gain insight into the properties of the solution by bringing our 
physics intuition and experience to bear. 


3.3. Inhomogeneous equations 
A linear inhomogeneous equation is one with a source term: 


po(x)y + pi(x)yy? +--+ + pr(x)y = f(z). (3.43) 


It is called “inhomogeneous” because the source term f(x) does not contain 
y, and so is different from the rest. We will devote an entire chapter to 
the solution of such equations by the method of Green functions. Here, we 
simply review some elementary material. 


3.3.1 Particular integral and complementary function 


One method of dealing with inhomogeneous problems, one that is especially 
effective when the equation has constant coefficients, is simply to try and 
guess a solution to (3.43). If you are successful, the guessed solution ypy 
is then called a particular integral. We may add any solution yor of the 
homogeneous equation 


po(x)y™ + pi(a)yY +--+ pa(x)y = 0 (3.44) 
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to yp; and it will still be a solution of the inhomogeneous problem. We 
use this freedom to satisfy the boundary or initial conditions. The added 
solution, ycr, is called the complementary function. 

Example: Charging capacitor. The capacitor in the circuit in figure 3.1 is 
initially uncharged. The switch is closed at t = 0 


Figure 3.1: Capacitor circuit 


The charge on the capacitor, Q, obeys 
R—+ ==YV, (3.45) 


where R, C, V are constants. A particular integral is given by Q(t) = CV. 
The complementary-function solution of the homogeneous problem is 


QO) = Qe", (3.46) 


where Qo is constant. The solution satisfying the initial conditions is 


Q(t) = CV (L—-e VRP). (3.47) 


3.3.2 Variation of parameters 
We now follow Lagrange, and solve 
po(x)y + pi(a)y) +--+» + pr(a)y = f(x) (3.48) 
by writing 
Y= UY1 + V2Y2 ee UnUn (3.49) 


where the y; are the n linearly independent solutions of the homogeneous 
equation and the v; are functions of x that we have to determine. This 
method is called variation of parameters. 
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Now, differentiating gives 


yl = vy + vayg Hoo + ony, + {uly + vaya Fe + Lyn}. (3.50) 
We will chose the v’s so as to make the terms in the braces vanish. Differen- 
tiate again: 


" ny 


y” = vyt + vayg + °° + ny + {orys + vgyg +--+ unyn}- (3.51) 


Again, we will chose the v’s to make the terms in the braces vanish. We 
proceed in this way until the very last step, at which we demand 


{ura + vjagt? +--+ oye th = f@)/polz). (8.52) 


If you substitute the resulting y into the differential equation, you will see 
that the equation is satisfied. 
We have imposed the following conditions on v%: 


VYi + Vgyo +--+ + Un = 9, 
Uy bye too, = 0, 
vy) babyy) te tony! = f(@)/polz). (3.53) 
This system of linear equations will have a solution for v},...,v/,, provided 
the Wronskian of the y; is non-zero. This, however, is guaranteed by the 
assumed linear independence of the y;. Having found the v},..., v/,, we obtain 
the v,,...,Un themselves by a single integration. 


Example: First-order linear equation. A simple and useful application of this 
method solves 


dy 
a + P(x)y = f(z). (3.54) 
he 
The solution to the homogeneous equation is 
yr = en Sa POs) as, (3.55) 
We therefore set : 
y = v(a)eW Ja Ps) as, (3.56) 


and find that : 
v'(x)eW Sa PO) 4s — F(x). (3.57) 
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We integrate once to find 


o(@) = / * p(ejel POA ae, (3.58) 


and so 


ua) = f fo {o RPO} ag (3.59) 


We select b to satisfy the initial condition. 


3.4 Singular points 


So far in this chapter, we have been assuming, either explicitly or tacitly, that 
our coefficients p;(x) are smooth, and that po(a) never vanishes. If po(a) does 
become zero (or, more precisely, if one or more of the p;/po becomes singular) 
then dramatic things happen, and the location of the zero of po is called a 
singular point of the differential equation. All other points are called ordinary 
points. 

In physics application we often find singular points at the ends of the 
interval in which we wish to solve our differential equation. For example, the 
origin r = 0 is often a singular point when r is the radial coordinate in plane 
or spherical polars. The existence and uniqueness theorems that we have 
relied out throughout this chapter may fail at singular endpoints. Consider, 
for example, the equation 

ry" +y' =0, (3.60) 


which is singular at x = 0. The two linearly independent solutions for x > 0 
are yi(2) = 1 and yo(x) = Inz. The general solution is therefore A+ Blnz, 
but no choice of A and B can satisfy the initial conditions y(0) = a, y'(0) = b 
when 6 is non-zero. Because of these complications, we will delay a systematic 
study of singular endpoints until chapter 8. 


3.4.1 Regular singular points 


If, in the differential equation 


poy” + piy’ + poy = 0, (3.61) 
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we have a point x = a such that 


po(2) = (w@—a)*P(2), pile) =(x—a)Q(x), po(t)=R(x), — (3.62) 


where P and Q and R are analytic! and P and Q non-zero in a neighbourhood 
of a then the point x = a is called a regular singular point of the equation. 
All other singular points are said to be irregular. Close to a regular singular 
point a the equation looks like 


P(a)(x — a)*y” + Q(a)(x — a)y' + R(a)y = 0. (3.63) 
The solutions of this reduced equation are 
y= (z@— a)", yo=(x—a)”, (3.64) 
where Aj» are the roots of the indicial equation 
A(A — 1) P(a) + AQ(a) + R(a) = 0. (3.65) 
The solutions of the full equation are then 


y= (e—a)"fi(z), yo = (@ — a)” fo(a), (3.66) 


where fj. have power series solutions convergent in a neighbourhood of a. 
An exception occurs when A; and A2 coincide or differ by an integer, in which 
case the second solution is of the form 


ya = (e — a)™ (Ine — a) fal) + fale), (3.67) 


where f; is the same power series that occurs in the first solution, and f> is 
a new power series. You will probably have seen these statements proved by 
the tedious procedure of setting 


fi(x) = (a — a)*(bp + bi (x — a) + bo(2 — a)? ++ --, (3.68) 


and obtaining a recurrence relation determining the b;. Far more insight is 
obtained, however, by extending the equation and its solution to the com- 
plex plane, where the structure of the solution is related to its monodromy 
properties. If you are familiar with complex analytic methods, you might like 
to look ahead to the discussion of monodromy in section 19.2. 


'A function is analytic at a point if it has a power-series expansion that converges to 
the function in a neighbourhood of the point. 


108 CHAPTER 3. LINEAR ORDINARY DIFFERENTIAL EQUATIONS 


3.5 Further exercises and problems 


Exercise 3.1: Reduction of Order. Sometimes additional information about 
the solutions of a differential equation enables us to reduce the order of the 
equation, and so solve it. 


a) Suppose that we know that y; = u(x) is one solution to the equation 
y’ + V(z)y = 0. 


By trying y = u(x)v(x) show that 


a 
mule) [5 


is also a solution of the differential equation. Is this new solution ever 
merely a constant mutiple of the old solution, or must it be linearly 
independent? (Hint: evaluate the Wronskian W (yg, y1).) 

b) Suppose that we are told that the product, y1y2, of the two solutions to 
the equation y” + py’ + poy = 0 is a constant. Show that this requires 


2p1p2 + pb = 0. 
c) By using ideas from part b) or otherwise, find the general solution of the 
equation 


(a +1)a7y" + ay! — (2 +1)?y = 0. 


Exercise 3.2: Show that the general solution of the differential equation 


y(x) = Ae® + Bre® — se In(1 +27) + ae* tan7!z. 


Exercise 3.3: Use the method of variation of parameters to show that if y;(«) 
and y2(z) are linearly independent solutions to the equation 


dy dy 
po(x) 3 + plz) + pa(x)y = 0, 
then general solution of the equation 
d? dy 
po) + pi(x)— + pola)y = f(x) 
dx dx 


is 
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Problem 3.4: One-dimensional scattering theory. Consider the one-dimensional 
Schrodinger equation 
dap 
ae V(z)p = Ey, 


where V(x) is zero except in a finite interval [—a,a] near the origin. 


\ ~ XxX 


Figure 3.2: A typical potential V for exercise 3.4 


Let L denote the left asymptotic region, —co < x < —a, and similarly let R 
denote a < x < oo. For E =k? there will be scattering solutions of the form 


_ fe +rzp(k)e*, cel, 
Pe a) = eee re Rk, 


which for k > 0 describe waves incident on the potential V(x) from the left. 
There will be solutions with 


tka 
cS en : xe, 


| ethe 4 rake **, reR, 


which for k < 0 describe waves incident from the right. The wavefunctions 
in [—a,a] will naturally be more complicated. Observe that [w,(a)]* is also a 
solution of the Schrodinger equation. 


By using properties of the Wronskian, show that: 


a) |rz,r\? +|tz,rl? = 1, 

) tr(k)=tr(—k). 

) Deduce from parts a) and b) that |rz(k)| = |rr(—&)|. 

) Take the specific example of V(x) = 6(x—b) with |b] < a. Compute the 
transmission and reflection coefficients and hence show that rz(k) and 
rp(—k) may differ in phase. 
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Exercise 3.5: Suppose w(x) obeys a Schrédinger equation 


a) 


(3a + [V(a) - #1) v=0. 


Make a smooth and invertable change of independent variable by setting 
x = x(z) and find the second order differential equation in z obeyed by 
U(z) = w(a(z)). Reduce this equation to normal form, and show that 
the resulting equation is 


DG apy ! D 
($25 + PM le) - H)- Fla}) Je) <0. 


where the primes denote differentiation with respect to z, and 


m N\ 2 
def 3 (2 
{ez = q! i 5 ( -) 


is called the Schwarzian derivative of « with respect to z. Schwarzian 
derivatives play an important role in conformal field theory and string 
theory. 

Make a sequence of changes of variable x — z — w, and so establish 
Cayley’s identity 


ey {x,z} +{z,w} = {z, w}. 


(Hint: If your proof takes more than one line, you are missing the point.) 


Chapter 4 


Linear Differential Operators 


In this chapter we will begin to take a more sophisticated approach to dif- 
ferential equations. We will define, with some care, the notion of a linear 
differential operator, and explore the analogy between such operators and 
matrices. In particular, we will investigate what is required for a linear dif- 
ferential operator to have a complete set of eigenfunctions. 


4.1 Formal vs. concrete operators 


We will call the object 
ttl at ah ea (4.1) 
POE gn TF PINE Tend a 
which we also write as 
po(x)On + pi(a)OR-* +--+ + p(x), (4.2) 


a formal linear differential operator. The word “formal” refers to the fact 
that we are not yet worrying about what sort of functions the operator is 
applied to. 


4.1.1 The algebra of formal operators 


Even though they are not acting on anything in particular, we can still form 
products of operators. For example if v and w are smooth functions of x we 
can define the operators 0, + v(x) and 0, + w(x) and find 


(0, +) (Op + w) = + u'4+ (wt v)d, + vw, (4.3) 


111 
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or 
(O, + w) (Oz + v) = 02 + u' + (wt v)d, + vw, (4.4) 
We see from this example that the operator algebra is not usually commuta- 
tive. 
The algebra of formal operators has some deep applications. Consider, 
for example, the operators 


L=—@? + q(z) (4.5) 


and 
P= 0? +a(x)0, +0,a(z). (4.6) 


In the last expression, the combination 0,a(x) means “first multiply by a(x), 
and then differentiate the result,” so we could also write 


Ora = a0, +a’. (4.7) 


We can now form the commutator [P, ZL] = PL — LP. After a little effort, 
we find 


[P, L] = (3q' + 4a’)6? + (3q” + 4a”)0, + q” + 2aq' +a”. (4.8) 
If we choose a = —34q, the commutator becomes a pure multiplication oper- 
ator, with no differential part: 
1 3 
[PL] = 7a" — sad’. (4.9) 
The equation 
dL 
or, equivalently, 
Bs 1 Mr 3 / 
ey, eee 4.11 
g= 70" — 59, (4.11) 
has a formal solution 
Ly Se" LO), (4.12) 


showing that the time evolution of L is given by a similarity transformation, 
which (again formally) does not change its eigenvalues. The partial differen- 
tial equation (4.11) is the famous Korteweg de Vries (KdV) equation, which 
has “soliton” solutions whose existence is intimately connected with the fact 
that it can be written as (4.10). The operators P and L are called a Lax 
pair, after Peter Lax who uncovered much of the structure. 
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4.1.2 Concrete operators 


We want to explore the analogies between linear differential operators and 
matrices acting on a finite-dimensional vector space. Because the theory of 
matrix operators makes much use of inner products and orthogonality, the 
analogy is closest if we work with a function space equipped with these same 
notions. We therefore let our differential operators act on L?{a, b], the Hilbert 
space of square-integrable functions on [a,b]. Now a differential operator 
cannot act on every function in the Hilbert space because not all of them 
are differentiable. Even though we will relax our notion of differentiability 
and permit weak derivatives, we must at least demand that the domain D, 
the subset of functions on which we allow the operator to act, contain only 
functions that are sufficiently differentiable that the function resulting from 
applying the operator remains an element of L?{a, 6]. We will usually restrict 
the set of functions even further, by imposing boundary conditions at the 
endpoints of the interval. A linear differential operator is now defined as a 
formal linear differential operator, together with a specification of its domain 
D, 

The boundary conditions that we will impose will always be linear and 
homogeneous. This is so that the domain of definition is a vector space. 
In other words, if y; and yo obey the boundary conditions then so should 
Ayi + LYy2. Thus, for a second-order operator 


L = pod: + pide + Do (4.13) 
on the interval [a,b], we might impose 


ayy(a) us ayy’ (a) Tr Biry(d) a Byay'(b) = 0, 
Boly] = aary(a) + aaay'(a) + Bory(b) + Bo2y"(b) = 90, (4.14) 


but we will not, in defining the differential operator, impose inhomogeneous 
conditions, such as 


aiy(a) Te ayy (a) sa Briry(d) +r Byay'(b) = A, 
Boly] = aaiy(a) + a2ay'(a) + Bory(b) + Boey'(b) = B, (4.15) 


with non-zero A, B — even though we will solve differential equations with 
such boundary conditions. 
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Also, for an n-th order operator, we will not constrain derivatives of order 
higher than n—1. This is reasonable!: If we seek solutions of Ly = f with L 
a second-order operator, for example, then the values of y” at the endpoints 
are already determined in terms of y’ and y by the differential equation. We 
cannot choose to impose some other value. By differentiating the equation 
enough times, we can similarly determine all higher endpoint derivatives in 
terms of y and x’. These two derivatives, therefore, are all we can fix by fiat. 

The boundary and differentiability conditions that we impose make D a 
subset of the entire Hilbert space. This subset will always be dense: any 
element of the Hilbert space can be obtained as an L? limit of functions in 
D. In particular, there will never be a function in L?{a, b] that is orthogonal 
to all functions in D. 


4.2 The adjoint operator 


One of the important properties of matrices, established in the appendix, 
is that a matrix that is selfadjoint, or Hermitian, may be diagonalized. In 
other words, the matrix has sufficiently many eigenvectors for them to form 
a basis for the space on which it acts. A similar property holds for self- 
adjoint differential operators, but we must be careful in our definition of 
self-adjointness. 

Before reading this section, We suggest you review the material on adjoint 
operators on finite-dimensional spaces that appears in the appendix. 


4.2.1 The formal adjoint 


Given a formal differential operator 


qd” qr-1 
L = po(z)=— + pi(t) 7 ot 


aah +---+Dr(x), (4.16) 


and a weight function w(x), real and positive on the interval (a,b), we can 


find another such operator L', such that, for any sufficiently differentiable 
u(x) and v(x), we have 


w (u*Lv — v(L'u)*) = ~Qlu.0), (4.17) 


'There is a deeper reason which we will explain in section 9.7.2. 
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for some function Q, which depends bilinearly on u and v and their first n—1 
derivatives. We call L' the formal adjoint of L with respect to the weight w. 
The equation (4.17) is called Lagrange’s identity. The reason for the name 
“adjoint” is that if we define an inner product 


b 
(ue), = f wu" da, (4.18) 


and if the functions u and v have boundary conditions that make QJ/u, v]|® = 
0, then 
(u, Lv), = (L' u,v), (4.19) 


which is the defining property of the adjoint operator on a vector space. The 
word “formal” means, as before, that we are not yet specifying the domain 
of the operator. 

The method for finding the formal adjoint is straightforward: integrate 
by parts enough times to get all the derivatives off v and on to u. 


Example: If 


d 
L=-i— 4.2 
“de 20) 


then let us find the adjoint L' with respect to the weight w = 1. We start 
from 


d 
* = i —7— 
i (ha) =u ( a v), 


and use the integration-by-parts technique once to get the derivative off v 
and onto u*: 
a Me dN d (uv) 
u'{|—i—v} = | i—u* ]v—i—(u'v 
dx dx dx 


= o(L'u)*+ *Qlu, 0) (4.21) 


| 
a. 
L 
S| et 
2 
Sa 
* 
e 
~ 
la 
eS 
* 
Sy 


d d \° d 
Bll Sel Se) ea 4.22 
u ( itv) a itu) ae iu*v), (4.22) 
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and found that q 
| ara a ote, 
U=-i Te’ 
The operator —id/dx (which you should recognize as the “momentum” op- 
erator from quantum mechanics) obeys L = L', and is therefore, formally 
self-adjoint, or Hermitian. 
Example: Let 


Qiu, v] = —iu*v. (4.23) 


2 


d 
SPD) TED + po, (4.24) 


with the p, all real. Again let us find the adjoint L' with respect to the inner 
product with w = 1. Now, proceeding as above, but integrating by parts 
twice, we find 


L=po 


" 


u* [pov" + piv’ + pov] — v [(pow)” — (piu)! + paul” 


7 < [po(u*v" — vu*’) + (pr — po)u*v]. (4.25) 
From this we read off that 
a 
= dae! = ae + po 
P / d " ! 
Poza + (2P9 — Pi) 7 + (Po — Pi + P2)- (4.26) 


What conditions do we need to impose on po,1,2 for this L to be formally 
self-adjoint with respect to the inner product with w = 1? For L = L' we 
need 


Po = Po 
= = Pe] DH 
Po-Pi tp, = po > po=Pi- (4.27) 


We therefore require that p; = pp, and so 


d d 
nals ao) oh 4.2 
dix: (m=) Pa; eee) 


which we recognize as a Sturm-Liouville operator. 
Example: Reduction to Sturm-Liouville form. Another way to make the 
operator 

2 


d 
+pi— + po, (4.29) 


ha 
Po dx? dx 
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self-adjoint is by a suitable choice of weight function w. Suppose that pg is 
positive on the interval (a,b), and that po, pi, p2 are all real. Then we may 


define é 
Te z exp if (2) iu'| (4.30) 
Po a Po 


and observe that it is positive on (a,b), and that 


1 
Ly = 7 (woe) + poy. (4.31) 
Now 
(u, Lv), — (Lu, v),, = [wpo(u*v’ — u*'v)]°, (4.32) 
where : 
(u,v). = wu" dz. (4.33) 


Thus, provided pp does not vanish, there is always some inner product with 
respect to which a real second-order differential operator is formally self- 
adjoint. 

Note that with 


1 
Ly = 7, wpoy')! + poy, (4.34) 
the eigenvalue equation 
Ly = Ay (4.35) 
can be written 
(wpoy’)’ + powy = Awy. (4.36) 


When you come across a differential equation where, in the term containing 
the eigenvalue A, the eigenfunction is being multiplied by some other function, 
you should immediately suspect that the operator will turn out to be self- 
adjoint with respect to the inner product having this other function as its 
weight. 

Illustration (Bargmann-Fock space): This is a more exotic example of a 
formal adjoint. You may have met with it in quantum mechanics. Consider 
the space of polynomials P(z) in the complex variable z = x + iy. Define an 
inner product by 


(P,Q) == | be? PE) Qo), 
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where d?z = dxdy and the integration is over the entire x,y plane. With 
this inner product, we have 


Cae a One: 


If we define 
ee 
dz’ 
then 
1 ps d 
Pa ee 2 —2*2z P ener 
(aq) = = f Pee (PE Ql 


where @! = z, i.e. the operation of multiplication by z. In this case, the 
adjoint is not even a differential operator.? 


Exercise 4.1: Consider the differential operator L = id/dx. Find the formal 
adjoint of L with respect to the inner product (u,v) = { wu*vdz, and find 
the corresponding surface term Q[u, v]. 


?In deriving this result we have used the Wirtinger calculus where z and z* are treated 
as independent variables so that 


d 


ake 


—z*z x2" Z 
and observed that, because [P(z)]*is a function of z* only, 


= [P(z)I" = 0. 


If you are uneasy at regarding z, z*, as independent, you should confirm these formulae 
by expressing z and z* in terms of x and y, and using 


i_1(9_,0) @_1(0,,0 
dz 2\0ar “Oy *  dzx — 2\0nr  Ody/)- 
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Exercise 4.2:Sturm-Liouville forms. By constructing appropriate weight func- 
tions w(x) convert the following common operators into Sturm-Liouville form: 


a) L (1 — 2?) d?/dx? + [(u—v) — (u tut 2)2] d/dz. 
b) LS Ce ) d? /dx? Se ae 
c) L = d?/dx? — 2a(1 — 2?) d/dae — m? (1 — «?)*1. 


4.2.2 A simple eigenvalue problem 


A finite Hermitian matrix has a complete set of orthonormal eigenvectors. 
Does the same property hold for a Hermitian differential operator? 
Consider the differential operator 


T=-0, D(T)={y,Ty € L’(0,1] : y(0) = y(1) = 0}. (4.37) 


With the inner product 
1 
(Y1, Y2) = | Yi Yo dx (4.38) 
0 


we have 
(y1, Ty2) — (Ty, yo) = [yh ye — yiys|o = 0. (4.39) 


The integrated-out part is zero because both y; and y2 satisfy the boundary 
conditions. We see that 


(yi, Ty2) = (Ty1, ye) (4.40) 
and so T’is Hermitian or symmetric. 


The eigenfunctions and eigenvalues of T’ are 


n(x) = sin nx 
ss aes \ n=1,2,.... (4.41) 


We see that: 
i) the eigenvalues are real; 
ii) the eigenfunctions for different \,, are orthogonal, 


1 
2 | Sil Weeds = Ose, Felt (4.42) 
0 
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iii) the normalized eigenfunctions y,(2) = /2sinnax are complete: any 
function in L?[0, 1] has an (L?) convergent expansion as 


eS S- anV2 sin naa (4.43) 


n=1 
where 


1 
= | y(x)V2sin naa de. (4.44) 
0 


This all looks very good — exactly the properties we expect for finite Her- 
mitian matrices. Can we carry over all the results of finite matrix theory to 
these Hermitian operators? The answer sadly is no! Here is a counterexam- 
ple: 

Let 


T= -i0,, D(T) ={y,Ty € L’[0,1] : y(0) = y(1) = 0}. (4.45) 


Again 


(1, Tye) —(Ty1, ye) = i; dx {Yi (—t0zY2) — (-10,91)*yo} 
= —ilyiyelo = 0. (4.46) 


Once more, the integrated out part vanishes due to the boundary conditions 
satisfied by y; and yo, so T is nicely Hermitian. Unfortunately, T’ with these 
boundary conditions has no eigenfunctions at all — never mind a complete 
set! Any function satisfying Ty = Ay will be proportional to e™”, but an ex- 
ponential function is never zero, and cannot satisfy the boundary conditions. 

It seems clear that the boundary conditions are the problem. We need 
a better definition of “adjoint” than the formal one — one that pays more 
attention to boundary conditions. We will then be forced to distinguish 
between mere Hermiticity, or symmetry, and true self-adjointness. 


Exercise 4.3: Another disconcerting example. Let p = —i0,. Show that the 
following operator on the infinite real line is formally self-adjoint: 


H = 2°p+pz2°. (4.47) 


Now let \ 
vs(v) = lel exp |}, (4.48) 
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where A is real and positive. Show that 
Ay, = -iryy, (4.49) 


so w) is an eigenfunction with a purely imaginary eigenvalue. Examine the 
proof that Hermitian operators have real eigenvalues, and identify at which 
point it fails. (Hint: H is formally self adjoint because it is of the form T7+T". 
Now wy is square-integrable, and so an element of L?(R). Is Ty) an element 
of L?(R)?) 


4.2.3. Adjoint boundary conditions 


The usual definition of the adjoint operator in linear algebra is as follows: 
Given the operator T : V — V and an inner product ( , ), we look at 
(u, Tv), and ask if there is a w such that (w,v) = (u,Tv) for all v. If there 
is, then wu is in the domain of TT, and we set TTu = w. 

For finite-dimensional vector spaces V there always is such a w, and so 
the domain of T’ is the entire space. In an infinite dimensional Hilbert space, 
however, not all (u, Tv) can be written as (w, v) with w a finite-length element 
of L?. In particular 6-functions are not allowed — but these are exactly what 
we would need if we were to express the boundary values appearing in the 
integrated out part, Q(u, v), as an inner-product integral. We must therefore 
ensure that u is such that Q(u,v) vanishes, but then accept any u with this 
property into the domain of T'. What this means in practice is that we look 
at the integrated out term Q(u,v) and see what is required of wu to make 
Q(u,v) zero for any v satisfying the boundary conditions appearing in D(T). 
These conditions on u are the adjoint boundary conditions, and define the 
domain of J7. 

Example: Consider 


T=-i0,, D(T) = {y,Ty € L’[0,1] : y(1) = 0}. (4.50) 


Now, 


1 


—i[u*(1)v(1) — u*(0)v(0)] + / dx(—i0,u)*v 
= -i[u*(1)v(1) — u*(0)v(0)] + (w,v), (4.51) 


1 
i dx u*(—i0,v) 
0 


where w = —i0,u. Since v(x) is in the domain of T, we have v(1) = 0, and 
so the first term in the integrated out bit vanishes whatever value we take 
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for u(1). On the other hand, u(0) could be anything, so to be sure that the 
second term vanishes we must demand that u(0) = 0. This, then, is the 
adjoint boundary condition. It defines the domain of 77: 


T' = =i0,, D(T") ={y,Ty © L710, 1] : y(0) = 0}. (4.52) 


For our problematic operator 


T = ~id,, D(T) = {y,Ty € L70,1] :y(0)=y(1)=0}, (4.53) 
we have 
1 1 
| dxu*(—i0,v) = —ilu*v]j+ i dx(—i0,u)*v 
0 0 
= O + (w,v), (4.54) 
where again w = —i0,u. This time no boundary conditions need be imposed 


on u to make the integrated out part vanish. Thus 
T' =-i0,, DT") ={y,Ty € L7[0,1]}. (4.55) 


Although any of these operators “T’ = —i0,” is formally self-adjoint we 
have, 
D(T) # DIT"), (4.56) 


so T and T" are not the same operator and none of them is truly self-adjoint. 


Exercise 4.4: Consider the differential operator M = d*/dx*, Find the formal 
adjoint of M with respect to the inner product (u,v) = f u*uvda, and find 
the corresponding surface term Q[u, v]. Find the adjoint boundary conditions 
defining the domain of M™ for the case 


D(M) = {y,y € £70, 1] : yO) = ¥"(0) = 9) = 9'"(1) = 0}. 


4.2.4 Self-adjoint boundary conditions 


A formally self-adjoint operator T is truly self adjoint only if the domains of 
Tt and T coincide. From now on, the unqualified phrase “self-adjoint” will 
always mean “truly self-adjoint.” 

Self-adjointness is usually desirable in physics problems. It is therefore 
useful to investigate what boundary conditions lead to self-adjoint operators. 
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For example, what are the most general boundary conditions we can impose 
on T = —i0, if we require the resultant operator to be self-adjoint? Now, 


1 1 
i dx u*(—id,v) — | dx(—i0,u)*v = —i (u*(1)v() ms u*(0)o(0)) . (4.57) 
0 0 
Demanding that the right-hand side be zero gives us, after division by u*(0)v(1), 
BA (4.58) 


We require this to be true for any u and v obeying the same boundary 
conditions. Since u and v are unrelated, both sides must equal a constant &, 
and furthermore this constant must obey «* = «~‘ in order that u(1)/u(0) 
be equal to v(1)/v(0). Thus, the boundary condition is 

ul) _ wv) _ 


1 
(0) = 70) =< (4.59) 


for some real angle 6. The domain is therefore 
D(T) = {y, Ty € L*[0, 1] : y(1) =e” y(0)}. (4.60) 


These are twisted periodic boundary conditions. 
With these generalized periodic boundary conditions, everything we ex- 
pect of a self-adjoint operator actually works: 
i) The functions u, = 27")? with n = ..., -2, -1,0,1,2... are eigen- 
functions of T’ with eigenvalues k, = 27n + 0. 
ii) The eigenvalues are real. 
iii) The eigenfunctions form a complete orthonormal set. 
Because self-adjoint operators possess a complete set of mutually orthogo- 
nal eigenfunctions, they are compatible with the interpretational postulates 
of quantum mechanics, where the square of the inner product of a state 
vector with an eigenstate gives the probability of measuring the associated 
eigenvalue. In quantum mechanics, self-adjoint operators are therefore called 
observables. 
Example: The Sturm-Liouville equation. With 


(x) A 


rr +q(xz), 2x € [a,b], (4.61) 
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we have 
(u, Lv) — (Lu, v) = [p(u*o! — uv)? (4.62) 


a’ 


Let us seek to impose boundary conditions separately at the two ends. Thus, 
at © = a we want 


(u*v’ — u'*v)|, = 0, (4.63) 


7 w(a) _ v'(a) 
y= aay (4.64) 


and similarly at 6. If we want the boundary conditions imposed on v (which 
define the domain of L) to coincide with those for u (which define the domain 
of L') then we must have 


= = tan, (4.65) 


for some real angle 6,, and similar boundary conditions with a 6) at b. We 
can also write these boundary conditions as 


Qay(a) + Bay’ (a) == 20) 
avy (b) + Boy'(b) 


(4.66) 


Deficiency indices and self-adjoint extensions 


There is a general theory of self-adjoint boundary conditions, due to Her- 
mann Weyl and John von Neumann. We will not describe this theory in any 
detail, but simply give their recipe for counting the number of parameters 
in the most general self-adjoint boundary condition: To find this number we 
define an initial domain Do(L) for the operator L by imposing the strictest 
possible boundary conditions. This we do by setting to zero the bound- 
ary values of all the y“ with n less than the order of the equation. Next 
count the number of square-integrable eigenfunctions of the resulting adjoint 
operator TJ? corresponding to eigenvalue +i. The numbers, n, and n_, of 
these eigenfunctions are called the deficiency indices. If they are not equal 
then there is no possible way to make the operator self-adjoint. If they are 
equal, n, = n_ =n, then there is an n? real-parameter family of self-adjoint 
extensions D(L) > Do(L) of the initial tightly-restricted domain. 
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Example: The sad case of the “radial momentum operator.” We wish to 
define the operator P, = —i0, on the half-line 0 < r < oo. We start with the 
restrictive domain 


P, = -i0,, Do(T) = {y, P-y € L[0, 00] : y(0) = O}. (4.67) 
We then have 
Pi =-i0,, D(P!) = {y, Ply € L7[0, oo]} (4.68) 


with no boundary conditions. The equation P'y = iy has a normalizable 
solution y = e-’. The equation P'y = —iy has no normalizable solution. 
The deficiency indices are therefore n, = 1, n_ = 0, and this operator 
cannot be rescued and made self adjoint. 

Example: The Schrédinger operator. We now consider —0? on the half-line. 


Set 
T=—0;, Do(T) = {y, Ty € L?[0, 00] : y(0) = y/(0) = 0}. (4.69) 
We then have 
M=-8, DT) = {y, Ty € L’(0, ool}. (4.70) 


Again T' comes with no boundary conditions. The eigenvalue equation 
Tty = iy has one normalizable solution y(z) = e@-*/V?, and the equation 
Tty = —iy also has one normalizable solution y(x) = e~+)*/ v2. The defi- 
ciency indices are therefore ny = n_ = 1. The Weyl-von Neumann theory 
now says that, by relaxing the restrictive conditions y(0) = y'(0) = 0, we 
can extend the domain of definition of the operator to find a one-parameter 
family of self-adjoint boundary conditions. These will be the conditions 
y'(0)/y(0) = tan@ that we found above. 

If we consider the operator —0? on the finite interval [a,b], then both 
solutions of (77 + i)y = 0 are normalizable, and the deficiency indices will 
be n, = n_ = 2. There should therefore be 2? = 4 real parameters in the 
self-adjoint boundary conditions. This is a larger class than those we found 
in (4.66), because it includes generalized boundary conditions of the form 


Bi ly] = any(a) + ay2y' (a) + Biry(b) + Bizy'(b) = 0, 
Bo [yl = a21y(a) a ay’ (a) Tr Bary(b) ag Booy'(b) 0 
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Figure 4.1: Heterojunction and wavefunctions. 


Physics application: semiconductor heterojunction 


We now demonstrate why we have spent so much time on identifying self- 
adjoint boundary conditions: the technique is important in practical physics 
problems. 

A heterojunction is an atomically smooth interface between two related 
semiconductors, such as GaAs and Al,Ga,_,As, which typically possess dif- 
ferent band-masses. We wish to describe the conduction electrons by an 
effective Schrodinger equation containing these band masses. What match- 
ing condition should we impose on the wavefunction ~(x) at the interface 
between the two materials? A first guess is that the wavefunction must be 
continuous, but this is not correct because the “wavefunction” in an effective- 
mass band-theory Hamiltonian is not the actual wavefunction (which is con- 
tinuous) but instead a slowly varying envelope function multiplying a Bloch 
wavefunction. The Bloch function is rapidly varying, fluctuating strongly 
on the scale of a single atom. Because the Bloch form of the solution is no 
longer valid at a discontinuity, the envelope function is not even defined in 
the neighbourhood of the interface, and certainly has no reason to be con- 
tinuous. There must still be some linear relation beween the w’s in the two 
materials, but finding it will involve a detailed calculation on the atomic 
scale. In the absence of these calculations, we must use general principles to 
constrain the form of the relation. What are these principles? 

We know that, were we to do the atomic-scale calculation, the resulting 
connection between the right and left wavefunctions would: 

e be linear, 

e involve no more than 7(x) and its first derivative w(x), 

e make the Hamiltonian into a self-adjoint operator. 

We want to find the most general connection formula compatible with these 
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principles. The first two are easy to satisfy. We therefore investigate what 
matching conditions are compatible with self-adjointness. 
Suppose that the band masses are mz, and mp, so that 


TL a? 
H = “nee ae Vz (2), xz <0, 
1 a? 
= Onde Tr Vr(x), xz > 0. (4.71) 


Integrating by parts, and keeping the terms at the interface gives us 


(1, Hrba) —(p ba) = 5 — {Wha — Vian} 5 — (haan — Wier 
Mz 2mrR 
(4.72) 
Here, wz,r refers to the boundary values of 7 immediately to the left or right 
of the junction, respectively. Now we impose general linear homogeneous 
boundary conditions on we: 


jae a 


This relation involves four complex, and therefore eight real, parameters. 
Demanding that 


(hi, Ape) = (Ad, 2), (4.74) 
we find 


= {Vin (clen + Wn) — Vir (avon + bub)} = 5—— (Winvae — U'invan} 
ME 2MR 

(4.75) 
and this must hold for arbitrary wer, 4p, so, picking off the coefficients of 
these expressions and complex conjugating, we find 


(y= (2) (C2 EY). ar 


Because we wish the domain of H™ to coincide with that of H, these must 
be same conditions that we imposed on wW2. Thus we must have 


(4) -GOCe %) am 
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af 
a O 1 ad —bd 
(: a - (2 ae (4.78) 


we see that this requires 


(: ALAC B (4.79) 


where ¢, A, B, C, D are real, and AD— BC = 1. Demanding self-adjointness 
has therefore cut the original eight real parameters down to four. These 
can be determined either by experiment or by performing the microscopic 
calculation.* Note that 4 = 2?, a perfect square, as required by the Weyl- 
Von Neumann theory. 


Since 


Exercise 4.5: Consider the Schrodinger operator H = —0? on the interval 
(0, 1]. Show that the most general self-adjoint boundary condition applicable 
to H can be written as 


Fo _ ie b ‘l ee 

¢'(0) ec d} [yp (1)]’ 

where ¢, a, b, c, d are real and ac — bd = 1. Consider H as the quantum 
Hamiltonian of a particle on a ring constructed by attaching x =0 tov = 1. 
Show that the self-adjoint boundary condition found above leads to unitary 


scattering at the point of join. Does the most general unitary point-scattering 
matrix correspond to the most general self-adjoint boundary condition? 


4.3 Completeness of eigenfunctions 


Now that we have a clear understanding of what it means to be self-adjoint, 
we can reiterate the basic claim: an operator 7 that is self-adjoint with 
respect to an L?[a, b| inner product possesses a complete set of mutually or- 
thogonal eigenfunctions. The proof that the eigenfunctions are orthogonal 
is identical to that for finite matrices. We will sketch a proof of the com- 
pleteness of the eigenfunctions of the Sturm-Liouville operator in the next 
section. 

The set of eigenvalues is, with some mathematical cavils, called the spec- 
trum of T. It is usually denoted by o(7’). An eigenvalue is said to belong to 


3For example, see: T. Ando, S. Mori, Surface Science 113 (1982) 124. 


4.3. COMPLETENESS OF EIGENFUNCTIONS 129 


the point spectrum when its associated eigenfunction is normalizable 7.e is 
a bona-fide member of L?/a,b] having a finite length. Usually (but not al- 
ways) the eigenvalues of the point spectrum form a discrete set, and so the 
point spectrum is also known as the discrete spectrum. When the opera- 
tor acts on functions on an infinite interval, the eigenfunctions may fail to 
be normalizable. The associated eigenvalues are then said to belong to the 
continuous spectrum. Sometimes, e.g. the hydrogen atom, the spectrum is 
partly discrete and partly continuous. There is also something called the 
residual spectrum, but this does not occur for self-adjoint operators. 


4.3.1 Discrete spectrum 


The simplest problems have a purely discrete spectrum. We have eigenfunc- 
tions @,(x) such that 
T Onl t) = An@nl2)s (4.80) 


where n is an integer. After multiplication by suitable constants, the ¢, are 
orthonormal, 


[ exo )enl2) dt = bm (4.81) 


and complete. We can express the completeness condition as the statement 
that 
>> bn(x) 64 (2') = 6(2 — 2’). (4.82) 


If we take this representation of the delta function and multiply it by f(z’) 
and integrate over x’, we find 


Fla) = dale) f o5(a') fa) ae’ (4.83) 


So, 
f(x) = do andn(2) (4.84) 


with 
= [eet dx’. (4.85) 


This means that if we can expand a delta function in terms of the ¢,(x), we 
can expand any (square integrable) function. 
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Figure 4.2: The sum )\", 2sin(n72) sin(nra’) for x! = 0.4. Take note of 
the very disparate scales on the horizontal and vertical axes. 


Warning: The convergence of the series )>, ¢,(x)¢*(x') to d(a — x’) is 
neither pointwise nor in the L? sense. The sum tends to a limit only in the 
sense of a distribution — meaning that we must multiply the partial sums by 
a smooth test function and integrate over x before we have something that 
actually converges in any meaningful manner. As an illustration consider our 
favourite orthonormal set: ¢,(x) = V2sin(n7z) on the interval [0,1]. A plot 
of the first 70 terms in the sum 


S- V2sin(nrx) V2 sin(nra’) = 6(x — 2’) 


is shown in figure 4.2. The “wiggles” on both sides of the spike at 7 = 
x’ do not decrease in amplitude as the number of terms grows. They do, 
however, become of higher and higher frequency. When multiplied by a 
smooth function and integrated, the contributions from adjacent positive and 
negative wiggle regions tend to cancel, and it is only after this integration 
that the sum tends to zero away from the spike at 7 = x’. 


Rayleigh-Ritz and completeness 


For the Schrodinger eigenvalue problem 


Ly =—-y"+¢(x2)y=Ay, «x € [a,b], (4.86) 
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the large eigenvalues are A,, © n?7?/(a — b)?. This is because the term qy 
eventually becomes negligeable compared to Ay, and we can then solve the 
equation with sines and cosines. We see that there is no upper limit to 
the magnitude of the eigenvalues. The eigenvalues of the Sturm-Liouville 
problem 

Ly = —(py’)’+qy =Ay, 2x € [a, 4], (4.87) 


are similarly unbounded. We will use this unboundedness of the spectrum to 
make an estimate of the rate of convergence of the eigenfunction expansion 
for functions in the domain of L, and extend this result to prove that the 
eigenfunctions form a complete set. 

We know from chapter one that the Sturm-Liouville eigenvalues are the 
stationary values of (y, Ly) when the function y is constrained to have unit 
length, (y, y) = 1. The lowest eigenvalue, Ag, is therefore given by 

w= sar (4.88) 

yeD(L) (y, y) 

As the variational principle, this formula provides a well-known method of 
obtaining approximate ground state energies in quantum mechanics. Part of 
its effectiveness comes from the stationary nature of (y, Ly) at the minimum: 
a crude approximation to y often gives a tolerably good approximation to Xo. 
In the wider world of eigenvalue problems, the variational principle is named 
after Rayleigh and Ritz.* 

Suppose we have already found the first n normalized eigenfunctions 
Yo, Y1,-++;Yn—-1- Let the space spanned by these functions be V,. Then an 
obvious extension of the variational principle gives 

jes rw, (4.89) 
yeV (y, ¥) 
We now exploit this variational estimate to show that if we expand an arbi- 
trary y in the domain of L in terms of the full set of eigenfunctions ym, 


m=0 


4J. W. Strutt (later Lord Rayleigh), “In Finding the Correction for the Open End of 
an Organ-Pipe.” Phil. Trans. 161 (1870) 77; W. Ritz, ”Uber eine neue Methode zur 
Lésung gewisser Variationsprobleme der mathematischen Physik.” J. reine angew. Math. 
135 (1908). 
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where 
= (Ym; ¥); (4.91) 
then the sum does indeed converge to y. 
Let 


be the residual error after the first n terms. By definition, h, € V,+. Let 
us assume that we have adjusted, by adding a constant to q if necessary, L 
so that all the A,, are positive. This adjustment will not affect the y,,. We 
expand out 


(hin; Lhm) = (y, Ly) — To Melon (4.93) 


where we have made use of the seheaencic of the ym. The subtracted 
sum is guaranteed positive, so 

(Rn, Ln) < (y, Ly). (4.94) 
Combining this inequality with Rayleigh-Ritz tells us that 


(y, Ly) < (An, Lin) 
ida? Unga (4.95) 


In other words oe 


y, Ly 
WED > Hy — arti? (4.96) 


m=0 


Since (y, Ly) is independent of n, and A, — oo, we have ||y — eo AmYm||? — 0. 
Thus the eigenfunction expansion indeed converges to y, and does so faster 
than A>" goes to zero. 

Our estimate of the rate of convergence applies only to the expansion of 
functions y for which (y, Ly) is defined — i.e. to functions y € D(L). The 
domain D(L) is always a dense subset of the entire Hilbert space L?{a, )], 
however, and, since a dense subset of a dense subset is also dense in the larger 
space, we have shown that the linear span of the eigenfunctions is a dense 
subset of L?[a,b]. Combining this observation with the alternative definition 
of completeness in 2.2.3, we see that the eigenfunctions do indeed form a 
complete orthonormal set. Any square integrable function therefore has a 
convergent expansion in terms of the ym, but the rate of convergence may 
well be slower than that for functions y € D (ZL). 
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Operator methods 


Sometimes there are tricks for solving the eigenvalue problem. 
Example: Quantum Harmonic Oscillator. Consider the operator 


H =(-0, + 2)(0; +2) +1=—0? + 2”. (4.97) 


This is in the form Q'Q + 1, where Q = (0, + x), and Qt = (—0, + 2) is its 
formal adjoint. If we write these operators in the opposite order we have 


QQ' = (0,4+2)(—03 +2) = =02 +27 +1=H +1, (4.98) 


Now, if w is an eigenfunction of Q'Q with non-zero eigenvalue \ then Qw is 
eigenfunction of QQ" with the same eigenvalue. This is because 


Q'Qy =A (4.99) 
implies that 
Q(@'Qv) = rQu, (4.100) 
or 
QQ'(QY) = A(QY). (4.101) 


The only way that Qy can fail to be an eigenfunction of QQ" is if it happens 
that Qw = 0, but this implies that Q'Qw = 0 and so the eigenvalue was zero. 
Conversely, if the eigenvalue is zero then 


0 = (vb, Q'Q¥) = (QU, QY), (4.102) 


and so Qw = 0. In this way, we see that Q'Q and QQ? have exactly the 
same spectrum, with the possible exception of any zero eigenvalue. 
Now notice that Q'Q does have a zero eigenvalue because 


Wo — ena 


obeys QW = 0 and is normalizable. The operator QQ’, considered as an 
operator on L?|—oo, co}, does not have a zero eigenvalue because this would 
require Qty) = 0, and so 


2 


(4.103) 


yy =eta™, (4.104) 


which is not normalizable, and so not an element of L?{—o0, oo}. 
Since 


H=Q'Q4+1=QqQ' -1, (4.105) 
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we see that wo is an eigenfunction of H with eigenvalue 1, and so an eigen- 
function of QQ' with eigenvalue 2. Hence Qty is an eigenfunction of Q'Q 
with eigenvalue 2 and so an eigenfunction H with eigenvalue 3. Proceeding 
in the way we find that 


dn = (Q')"vo (4.106) 
is an eigenfunction of H with eigenvalue 2n + 1. 
Since Qi = 30,672" we can write 
tn(t) = Hp(x)e7 2”, (4.107) 
where 
H,,(x) = (-1)"e? fe (4.108) 
at ae dx 


are the Hermite Polynomials. 

This is a useful technique for any second-order operator that can be fac- 
torized — and a surprising number of the equations for “special functions” 
can be. You will see it later, both in the exercises and in connection with 
Bessel functions. 


Exercise 4.6: Show that we have found all the eigenfunctions and eigenvalues 
of H = —0?+ 2x”. Hint: Show that Q lowers the eigenvalue by 2 and use the 
fact that Q'Q cannot have negative eigenvalues. 


Problem 4.7: Schrédinger equations of the form 


d2 
aes — (1+ 1)sech?2 py = Ey 


are known as Poschel-Teller equations. By setting u = Itanhz and following 
the strategy of this problem one may relate solutions for / to those for /—1 and 
so find all bound states and scattering eigenfunctions for any integer J. 


a) Suppose that we know that 7 = exp {— f? ula’) da! } is a solution of 


Ly = ( a +W(e)) o=0. 


dx? 


Show that L can be written as L = M'tM where 


ida) aoe), 


the adjoint being taken with respect to the product (u,v) = f u*vdz. 
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b) Now assume LF is acting on functions on [—oo, oo] and that we not have 
to worry about boundary conditions. Show that given an eigenfunction 
w_ obeying MiMw_ = AW_ we can multiply this equation on the left 
by M and so find a eigenfunction w+; with the same eigenvalue for the 
differential operator 


L' =MM' = (+ - u(z)) (-z - u(z)) 


and vice-versa. Show that this correspondence w_ © w+ will fail if, and 
only if, A = 0. 

c) Apply the strategy from part b) in the case u(x) = tanh z and one of the 
two differential operators M'M, MM1 is (up to an additive constant) 


d 2 
H =—— —2sech?z. 

dx 
Show that H has eigenfunctions of the form ~, = e’**P(tanh«z) and 
eigenvalue E = k? for any k in the range —oo < k < oo. The function 
P(tanhz) is a polynomial in tanhz which you should be able to find 
explicitly. By thinking about the exceptional case \ = 0, show that H 
has an eigenfunction W(x), with eigenvalue E = —1, that tends rapidly 
to zero as x — +oo. Observe that there is no corresponding eigenfunction 
for the other operator of the pair. 


4.3.2 Continuous spectrum 


Rather than a give formal discussion, we will illustrate this subject with some 
examples drawn from quantum mechanics. 
The simplest example is the free particle on the real line. We have 


H =-—6é?. (4.109) 


We eventually want to apply this to functions on the entire real line, but we 
will begin with the interval [—L/2, L/2], and then take the limit L — oo 
The operator H has formal eigenfunctions 


yo, (x) = et, (4.110) 


corresponding to eigenvalues \ = k?. Suppose we impose periodic boundary 
conditions at 7 = 1/2: 


yr(—L/2) = pe(+L/2). (4.111) 
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This selects k, = 27n/L, where n is any positive, negative or zero integer, 
and allows us to find the normalized eigenfunctions 


1 
Xn(z) = ts 


The completeness condition is 


pet (4.112) 


1, Sey 
S- ae =6(4—2'), «,2’ € [-L/2, 1/2). (4.113) 


n>=— CO 


As L becomes large, the eigenvalues become so close that they can hardly be 
distinguished; hence the name continuous spectrum,® and the spectrum o(H) 
becomes the entire positive real line. In this limit, the sum on n becomes an 
integral 


(oe) 


ELj-fatJ-faQ{— a 
_ - ~ = (4.115) 


is called the (momentum) density of states. If we divide this by L to get a 
density of states per unit length, we get an L independent “finite” quantity, 
the local density of states. We will often write 


pm lh): (4.116) 


If we express the density of states in terms of the eigenvalue \ then, by 
an abuse of notation, we have 


0) = =a (4.117) 


°When L is strictly infinite, y,(x) is no longer normalizable. Mathematicians do not 
allow such un-normalizable functions to be considered as true eigenfunctions, and so a 
point in the continuous spectrum is not, to them, actually an eigenvalue. Instead, they 
say that a point \ lies in the continuous spectrum if for any € > O there exists an ap- 
proximate eigenfunction yp. such that ||y.|| = 1, but ||/Ly. — Aye|| < €. This is not a 
profitable definition for us. We prefer to regard non-normalizable wavefunctions as being 
distributions in our rigged Hilbert space. 
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Note that 
dn dn dk 


dy dk dX’ 
which looks a bit weird, but remember that two states, +k,, correspond to 
the same \ and that the symbols 


(4.118) 


dn dn 


—_ — 4.11 
dk’ dX aaa 


are ratios of measures, 7.e. Radon-Nikodym derivatives, not ordinary deriva- 
tives. 
In the L — oo limit, the completeness condition becomes 


/ GR ik(e—2! = d(x — 2’), (4.120) 
soa DIE 
and the length L has disappeared. 


Suppose that we now apply boundary conditions y = 0 on x = +L/2. 
The normalized eigenfunctions are then 


2 
Xn = ye sin kp(x + L/2), (4.121) 


where k, = na/L. We see that the allowed k’s are twice as close together as 
they were with periodic boundary conditions, but now n is restricted to being 
a positive non-zero integer. The momentum density of states is therefore 


p(k) = =-, (4.122) 


which is twice as large as in the periodic case, but the eigenvalue density of 
states is 


(4.123) 


which is exactly the same as before. 

That the number of states per unit energy per unit volume does not 
depend on the boundary conditions at infinity makes physical sense: no 
local property of the sublunary realm should depend on what happens in 
the sphere of fixed stars. This point was not fully grasped by physicists, 
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however, until Rudolph Peierls® explained that the quantum particle had to 
actually travel to the distant boundary and back before the precise nature 
of the boundary could be felt. This journey takes time T (depending on 
the particle’s energy) and from the energy-time uncertainty principle, we 
can distinguish one boundary condition from another only by examining the 
spectrum with an energy resolution finer than A/T. Neither the distance nor 
the nature of the boundary can affect the coarse details, such as the local 
density of states. 

The dependence of the spectrum of a general differential operator on 
boundary conditions was investigated by Hermann Weyl. Wey] distinguished 
two classes of singular boundary points: lmit-circle, where the spectrum 
depends on the choice of boundary conditions, and limit-point, where it does 
not. For the Schrodinger operator, the point at infinity, which is “singular” 
simply because it is at infinity, is in the limit-point class. We will discuss 
Weyl’s theory of singular endpoints in chapter 8. 


Phase-shifts 


Consider the eigenvalue problem 


d2 
(-= Ba vn) y= Ew (4.124) 
on the interval [0, R], and with boundary conditions ~(0) = 0 = 7~(R). This 
problem arises when we solve the Schrodinger equation for a central potential 
in spherical polar coordinates, and assume that the wavefunction is a function 
of r only (i.e. S-wave, or | = 0). Again, we want the boundary at R to be 
infinitely far away, but we will start with R at a large but finite distance, 
and then take the R — oo limit. Let us first deal with the simple case that 
V(r) = 0; then the solutions are 


wz(r) x sin kr, (4.125) 
with eigenvalue E = k?, and with the allowed values of being given by 
k,R = nr. Since 

= R 
i sin?(kpr) dr = oe (4.126) 
0 


°Peierls proved that the phonon contribution to the specific heat of a crystal could be 
correctly calculated by using periodic boundary conditions. Some sceptics had thought 
that such “unphysical” boundary conditions would give a result wrong by factors of two. 
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the normalized wavefunctions are 


2 
Wh =4/ R sin kr, (4.127) 


> (=) sin(k,r) sin(kyr’) = 6(r — 1’). (4.128) 


n=l 


and completeness reads 


As R becomes large, this sum goes over to an integral: 


lo) 


S (5) sin(knr) sin(kpr’) [ dn (5) sin(kr) sin(kr’), 


n=1 
= | nak (=) sin(kr) sin(kr’Y4.129) 
0 


ae 


Thus, 


(2) / desea 860), (4.130) 


7 
As before, the large distance, here R, no longer appears. 

Now consider the more interesting problem which has the potential V(r) 
included. We will assume, for simplicity, that there is an Rp such that V(r) 
is zero for r > Ro. In this case, we know that the solution for r > Ro is of 
the form 

We(r) = sin (kr + n(k)), (4.131) 
where the phase shift n(k) is a functional of the potential V. The eigenvalue 
is still H = k?. 

Example: A delta-function shell. We take V(r) = Ad(r — a). See figure 4.3. 


N 
* 16 (r—a) 


a 


Figure 4.3: Delta function shell potential. 


140 CHAPTER 4. LINEAR DIFFERENTIAL OPERATORS 


A solution with eigenvalue E = k? and satisfying the boundary condition at 
r=Ois 


_ jf Asin(kr), r<a, 
v(r) = { sin(kr +7), r>a. el?) 
The conditions to be satisfied at r = a are: 
i) continuity, Y(a — «) = %(a+€) = (a), and 
ii) jump in slope, —v’(a+ €) + w'(a — €) + AV(a) = 0. 


Therefore, 
plate a-e«) 
= <9. 4.133 
va) Oa) ae 
~ kcos(ka+n) _ keos(ka) 
cos(ka +7) — KCOS(Ka) _ 
“sin(ka +7) sim(ka) ~~ ae 
Thus, 
cot(ka + 7) — cot(ka) = 7 (4.135) 
and : 
n(k) = —ka + cot"! (2 + cot ka) (4.136) 
4 (Kk) 
1 2m 3m 4a be die 


Figure 4.4: The phase shift n(k) of equation (4.136) plotted against ka. 


A sketch of 7(k) is shown in figure 4.4. The allowed values of k are required 
by the boundary condition 


sin(kR + n(k)) =0 (4.137) 
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to satisfy 
kR+(k) =n. (4.138) 


This is a transcendental equation for k, and so finding the individual solutions 
k, is not simple. We can, however, write 


1 
a (er S. n(k)) (4.139) 
7 
and observe that, when R becomes large, only an infinitesimal change in k 


is required to make n increment by unity. We may therefore regard n as a 
“continuous” variable which we can differentiate with respect to k to find 


dn 1 On 
oS le 4.14 
dk {R : a} aay 
The density of allowed k values is therefore 
1 On 
k)=—4R+—>. 4.141 
p(t) == {r+ ot} (4.141) 


(R-a)/ I Lae eal 


1 2m 31 = ka > 
a 


Figure 4.5: The density of states for the delta-shell potential. The extended 
states are so close in energy that we need an optical aid to resolve individual 
levels. The almost-bound resonance levels have to squeeze in between them. 


This figure shows a sequence of resonant bound states at ka = na superposed 
on the background continuum density of states appropriate to a large box of 
length (R—a). Each “spike” contains one extra state, so the average density 
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of states is that of a box of length R. We see that changing the potential 
does not create or destroy eigenstates, it just moves them around. 

The spike is not exactly a delta function because of level repulsion between 
nearly degenerate eigenstates. The interloper elbows the nearby levels out of 
the way, and all the neighbours have to make do with a bit less room. The 
stronger the coupling between the states on either side of the delta-shell, the 
stronger is the inter-level repulsion, and the broader the resonance spike. 


Normalization factor 


We now evaluate 


R 
fi dri aN, (4.142) 
0 
so as to find the the normalized wavefunctions 
Xb = Nee. (4.143) 
Let v,(r) be a solution of 

da 
Hy= (-F ++ vn) w= kw (4.144) 

r 


satisfying the boundary condition ~;,(0) = 0, but not necessarily the bound- 
ary condition at r = R. Such a solution exists for any k. We scale Ww, by 
requiring that w,(r) = sin(kr + 7) for r > Ro. We now use Lagrange’s 
identity to write 


R 
i dr {(Huvbe) We — Vel Ade) } 


(adhe — Viele 
= sin(kR + n)k’cos(k'R +1) 
—kcos(kR +7) sin(k’R+7). (4.145) 


R 
k2 — K” d i 
) ee 


Here, we have used wx4/(0) = 0, so the integrated out part vanishes at the 
lower limit, and have used the explicit form of w,%- at the upper limit. 
Now differentiate with respect to k, and then set k = k’. We find 


2k fF ervos? zs —5sin(2(kR ! n)) ! a ! xt}. (4.146) 
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In other words, 


[arte =5 fre se} - azsin(26R +n). (4.147) 


At this point, we impose the boundary condition at r = R. We therefore 
have kR +7 =n7 and the last term on the right hand side vanishes. The 
final result for the normalization integral is therefore 


ss 1 On 
2 — —— 
/ anys = 5 {e+ at. (4.148) 


Observe that the same expression occurs in both the density of states 
and the normalization integral. When we use these quantities to write down 
the contribution of the normalized states in the continuous spectrum to the 
completeness relation we find that 


a dk (=) Never )be(r’) = (=) [ dk ux (r)we(r’), (4.149) 


the density of states and normalization factor having cancelled and disap- 
peared from the end result. This is a general feature of scattering problems: 
The completeness relation must give a delta function when evaluated far from 
the scatterer where the wavefunctions look like those of a free particle. So, 
provided we normalize w~, so that it reduces to a free particle wavefunction 
at large distance, the measure in the integral over k must also be the same 
as for the free particle. 

Including any bound states in the discrete spectrum, the full statement 
of completeness is therefore 


S > dnlrn(r’) + (=) i: * ak welr) wer’) = 5(r —r’). (4.150) 


bound states 


Example: We will exhibit a completeness relation for a problem on the entire 
real line. We have already met the Poschel-Teller equation, 


i= (= —I(l+1) sec) y= Ew (4.151) 


in exercise 4.7. When / is an integer, the potential in this Schrodinger equa- 
tion has the special property that it is reflectionless. 
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The simplest non-trivial example is 1 = 1. In this case, H has a single 
discrete bound state at Ey = —1. The normalized eigenfunction is 


il 
U(x) = a a (4.152) 


The rest of the spectrum consists of a continuum of unbound states with 
eigenvalues E(k) = k? and eigenfunctions 


1 
V1+k? 


Here, & is any real number. The normalization of w;,(a) has been chosen so 
that, at large |x|, where tanh z — +1, we have 


Wy, (2) = e'**(_ik + tanhz). (4.153) 


Pila)ve(a") > eT *O-*), (4.154) 


The measure in the completeness integral must therefore be dk/27, the same 
as that for a free particle. 
Let us compute the difference 


I 


ae —al) =f Fueleyen(e 
= fF le8 -vx@n@) 


vo 2 
7 2 dk; onik(e—a') 1+ 7k(tanha — tanh’) — tanhztanh2’ 
oe ca OE 1+ k? 
(4.155) 


We use the standard integral, 


on, dk ~ik(x—2') 1 1 —|e—2'| 
—— eik(e-# = iets 4.156 
/ — i+h 2° ee) 
together with its x’ derivative, 
—ik(a—z’) = — a) ie lt-2"| 
Z ee ae (x — 2 )5e ’ (4.157) 


to find 


1 ' 
ie 5{2 + sgn (x — 2’)(tanhz — tanh 2’) — tanh x tanh ee | (4.158) 
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Assume, without loss of generality, that x > x’; then this reduces to 


1 ' 1 
5 (1 + tanh x)(1 — tanh 2')e"@-®? 5sech x sech x’ 


Thus, the expected completeness condition 


vo(oydolal) + fF uileex(e') =sle=2'), (4.160) 


—oo 


is confirmed. 


4.4 Further exercises and problems 


We begin with a practical engineering eigenvalue problem. 


Exercise 4.8: Whirling drive shaft. A thin flexible drive shaft is supported by 
two bearings that impose the conditions 7’ = y/ = 7 = y =O at at z=4L. 
Here x(z), y(z) denote the transverse displacements of the shaft, and the 


primes denote derivatives with respect to z. 


wi 


Figure 4.6: The n = 1 even-parity mode of a whirling shaft. 


The shaft is driven at angular velocity w. Experience shows that at certain 
critical frequencies w,, the motion becomes unstable to whirling — a sponta- 
neous vibration and deformation of the normally straight shaft. If the rotation 
frequency is raised above w,, the shaft becomes quiescent and straight again 
until we reach a frequency wn4+41, at which the pattern is repeated. Our task 
is to understand why this happens. 
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The kinetic energy of the whirling shaft is 


LA fa ps, ee a ts 
T=5] ple +y*}dz, 
2: ser, 


and the strain energy due to bending is 


L 
View =5 fre? + Pha. 


a) Write down the Lagrangian, and from it obtain the equations of motion 
for the shaft. 
b) Seek whirling-mode solutions of the equations of motion in the form 


x(z,t) = wW(z)cosuat, 
G24). -=] Ul 2isin ot. 


Show that this quest requires the solution of the eigenvalue problem 


c) Show that the critical frequencies are given in terms of the solutions €,, 
to the transcendental equation 


tanhé, =+tan&,, (x) 


TO) 
n p L ’ 


Show that the plus sign in * applies to odd parity modes, where w(z) = 
—~(—z), and the minus sign to even parity modes where ~(z) = w(—z). 


as 


Whirling, we conclude, occurs at the frequencies of the natural transverse 
vibration modes of the elastic shaft. These modes are excited by slight imbal- 
ances that have negligeable effect except when the shaft is being rotated at 
the resonant frequency. 


Insight into adjoint boundary conditions for an ODE can be obtained by 
thinking about how we would impose these boundary conditions in a numer- 
ical solution. The next exercise problem this. 
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Problem 4.9: Discrete approximations and self-adjointness. Consider the sec- 
ond order inhomogeneous equation Lu = u” = g(x) on the interval 0 <a <1. 
Here g(x) is known and u(x) is to be found. We wish to solve the problem on a 
computer, and so set up a discrete approximation to the ODE in the following 
way: 
e replace the continuum of independent variables 0 <x <1 by the discrete 
lattice of points 0 <2, = (n— 3)/N <1. Here N is a positive integer 
and n = 1,2,...,N; 
e replace the functions u(x) and g(x) by the arrays of real variables u, = 
u(t) and gn = 9(%n); 
e replace the continuum differential operator d?/dx? by the difference op- 
erator D defined by ea Unt1 — 2Un + Un-1- 


Now do the following problems: 


a) Impose continuum Dirichlet boundary conditions u(0) = u(1) = 0. De- 
cide what these correspond to in the discrete approximation, and write 
the resulting set of algebraic equations in matrix form. Show that the 
corresponding matrix is real and symmetric. 

b) Impose the periodic boundary conditions u(0) = u(1) and u/(0) = u’(1), 
and show that these require us to set uo = uy and uny, = uw. Again 
write the system of algebraic equations in matrix form and show that 
the resulting matrix is real and symmetric. 

c) Consider the non-symmetric N-by-N matrix operator 


0 O 0 0 0 ... O UN 
1 —2 1 0 0 ... O UN-1 
0 1 —-2 1 O ... O UN-2 
Drs 2. oe Sa ee eG : 
O. St On bs =: i 0 U3 
O» eae 0: -Ge Ub Sora ug 
0 .. O. “OO OY 0 uy 


i) What vectors span the null space of D?? 
ii) To what continuum boundary conditions for d?/dx? does this matrix 
correspond? 
iii) Consider the matrix (D?)', To what continuum boundary condi- 
tions does this matrix correspond? Are they the adjoint boundary 
conditions for the differential operator in part ii)? 


Exercise 4.10: Let 
if = ( —10, my, — —_ 


Mm, +ims 10x 
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— 1030, + mo, + md 


be a one-dimensional Dirac Hamiltonian. Here m;(x) and mo(x) are real 
functions and the G; are the Pauli matrices. The matrix differential operator 
HT acts on the two-component “spinor” 


wa) = (35) ) 


a) Consider the eigenvalue problem HW = EV on the interval [a,b]. Show 
that the boundary conditions 


v1(a) 
y2(a) 


W1(d) 
w2(b) 


= exp{i6q}, = exp{i0y} 
where 6,, 9) are real angles, make H into an operator that is self-adjoint 
with respect to the inner product 


b 
(U1, ) = | Wt (x) Vo(z) de. 


b) Find the eigenfunctions WV, and eigenvalues E,, in the case that m, = 
mz = 0 and the 4,» are arbitrary real angles. 


Here are three further problems involving the completeness of operators with 
a continuous spectrum: 


Problem 4.11: Missing State. In problem 4.7 you will have found that the 
Schrodinger equation 


d? 2 
ie ie =f 
( ; sech x) w wy 


has eigensolutions 


we(x) = e!**(—ik + tanh x) 
with eigenvalue E = k?. 

e For x large and positive ~,(x) ~ Ae** ek) while for x large and neg- 
ative W,(x) ~ Ae*te-m*) the (complex) constant A being the same 
in both cases. Express the phase shift 7(k) as the inverse tangent of an 
algebraic expression in k. 
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e Impose periodic boundary conditions w(—L/2) = v(+L/2) where L > 1. 
Find the allowed values of & and hence an explicit expression for the k- 
space density, p(k) = th of the eigenstates. 

e Compare your formula for p(k) with the corresponding expression, ~o0(k) = 
L/2r, for the eigenstate density of the zero-potential equation and com- 
pute the integral 


AN =f {o(k) — po(k)}al 


e Deduce that one eigenfunction has gone missing from the continuum and 
become the localized bound state qo(x) = gysech aie 


Problem 4.12: Continuum Completeness. Consider the differential operator 


0<2%<o@ 


with self-adjoint boundary conditions w(0)/w’(0) = tan 6 for some fixed angle 
6. 


e Show that when tan é < 0 there is a single normalizable negative-eigenvalue 
eigenfunction localized near the origin, but none when tan@ > 0. 

e Show that there is a continuum of positive-eigenvalue eigenfunctions of 
the form (a) = sin(ka + n(k)) where the phase shift 7 is found from 


eink) = 1 + ik tan 6 


— («V/1 +k tan2 6 


e Write down (no justification required) the appropriate completeness re- 
lation 


_ [dn 


d(z —2') = Tk 


Neve (ave (a) dk + S> dn(a)bn(2’) 


bound 


with an explicit expression for the product (not the separate factors) of 
the density of states and the normalization constant N7?, and with the 
correct limits on the integral over k. 

e Confirm that the ~, continuum on its own, or together with the bound 
state when it exists, form a complete set. You will do this by evaluating 
the integral 


leg \= =f sin(kx + (k)) sin(ka’ + n(k)) dk 
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and interpreting the result. You will need the following standard integral 
[- Gk ike Lae 
0 2m 14k?t? Qe 


Take care! You should monitor how the bound state contribution switches 
on and off as @ is varied. Keeping track of the modulus signs |... | in the 
standard integral is essential for this. 


Problem 4.13: One-dimensional scattering redux. Consider again the one- 
dimensional Schrédinger equation from chapter 3 problem 3.4: 


ay 
sibs =F 
Ta + V (ay = By, 
where V(x) is zero except in a finite interval [—a,a] near the origin. 
| V(x) 
out ; | out 
ay , | eR 
qit qin 
L R 
—a a Se 
Lb R 


Figure 4.7: Incoming and outgoing waves in problem 4.13. The asymptotic 
regions L and R are defined by L = {x < —a} and R= {xz > a}. 


For k > 0, consider solutions of the form 


ape ie gate he. rE D. 
Gne * + gene, ce R. 


ve)={ 
a) Show that, in the notation of problem 3.4, we have 


fof) = [re a8) Fae 


tr(k) rpr(—k) ap 


and show that the S-matrix 


is unitary. 
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b) By observing that complex conjugation interchanges the “in” and “out” 
waves, show that it is natural to extend the definition of the transmission 
and reflection coefficients to all real k by setting rz p(k) = Tr R(-k), 
ti,r(k) = th p(-k). 

c) In problem 3.4 we introduced the particular solutions 


eft rr (ke, cel 
os , a Sh) 
ve (a) cas ze R, oe 
tr(k)et** re L, i 
? <0. 
{ eke 4 pp(kje**, a2 CR. 


Show that, together with any bound states w,(x), these w(x) satisfy 
the completeness relation 


SBS (e)Wn(2") + : © Focal’) = 6a 2" 


Qn 
bound ee 


provided that 


— Ss" WR (z)bn(2') = / Se ra (he et) a,x € L, 
bound —oo 47 
= / ak (he ne) cel, «ER, 
oo 27 
OR A (hyenth@—2") 
= ae, tk(a—a2 L 
[ste , wER, a2 el, 
— / dk a(k)e~iete") a, a! ER 
_oo 20 


d) Compute rz,r(k) and ty, p(k) for the potential V(x) = —Ad(a),and verify 
that the conditions in part c) are satisfied. 


If you are familiar with complex variable methods, look ahead to chapter 
18 where problem 18.22 shows you how to use complex variable methods to 
evaluate the Fourier transforms in part c), and so confirm that the bound state 
Un(x) and the (x) together constitute a complete set of eigenfunctions. 


Problem 4.14: Levinson’s Theorem and the Friedel sum rule. The interaction 


between an attractive impurity and (S-wave, and ignoring spin) electrons in 
a metal can be modelled by a one-dimensional Schrédinger equation 


—44V(r)x = kx. 
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Here r is the distance away from the impurity and V(r) is the (spherically 
symmetric) impurity potential and y(r) = V4mrq(r) where 7(r) is the three- 
dimensional wavefunction. The impurity attracts electrons to its vicinity. Let 
x2(r) = sin(kr) denote the unperturbed wavefunction, and y;(r) denote the 
perturbed wavefunction that beyond the range of impurity potential becomes 
sin(kr + n(k)). We fix the 2n7 ambiguity in the definition of n(k) by taking 
n(co) to be zero, and requiring (k) to be a continuous function of k. 


e Show that the continuous-spectrum contribution to the change in the 
number of electrons within a sphere of radius R surrounding the impurity 
is given by 


k R 
> { f (/ Liye(a)/? = x2(a) 2} ar) Fp [n(k) — (0)]+ oscillations. 


Here ky is the Fermi momentum, and “oscillations” refers to Friedel oscil- 
lations © cos(2(krR+)). You should write down an explicit expression 
for the Friedel oscillation term, and recognize it as the Fourier transform 
of a function « k~! sin n(k). 

e Appeal to the Riemann-Lebesgue lemma to argue that the Friedel density 
oscillations make no contribution to the accumulated electron number in 
the limit R — oo. 

(Hint: You may want to look ahead to the next part of the problem in 
order to show that k~!sin(k) remains finite as k — 0.) 


The impurity-induced change in the number of unbound electrons in the in- 
terval [0, R] is generically some fraction of an electron, and, in the case of 
an attractive potential, can be negative — the phase-shift being positive and 
decreasing steadily to zero as k increases to infinity. This should not be sur- 
prising. Each electron in the Fermi sea speeds up as it enters an attractive 
potential well, spends less time there, and so makes a smaller contribution 
to the average local density than it would in the absence of the potential. 
We would, however, surely expect an attractive potential to accumulate a net 
positive number of electrons. 


e Show that a negative continuous-spectrum contribution to the accumu- 
lated electron number is more than compensated for by a positive number 


Weed i * (polk) — p(k) )dk = — i =o ak = —n(0). 


of electrons bound to the potential. After accounting for these bound 
electrons, show that the total number of electrons accumulated near the 
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impurity is 
1 
= —n(kf). 
Qtot ak f) 


This formula (together its higher angular momentum versions) is known 
as the Friedel sum rule. The relation between 7(0) and the number of 
bound states is called Levinson’s theorem. A more rigorous derivation 
of this theorem would show that 7(0) may take the value (n + 1/2)a 
when there is a non-normalizable zero-energy “half-bound” state. In 
this exceptional case the accumulated charge will depend on R. 
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Chapter 5 


Green Functions 


In this chapter we will study strategies for solving the inhomogeneous linear 
differential equation Ly = f. The tool we use is the Green function, which 
is an integral kernel representing the inverse operator L~'. Apart from their 
use in solving inhomogeneous equations, Green functions play an important 
role in many areas of physics. 


5.1 Inhomogeneous linear equations 


We wish to solve Ly = f for y. Before we set about doing this, we should 
ask ourselves whether a solution exists, and, if it does, whether it is unzque. 
The answers to these questions are summarized by the Fredholm alternative. 


5.1.1 Fredholm alternative 


The Fredholm alternative for operators on a finite-dimensional vector space 
is discussed in detail in the appendix on linear algebra. You will want to 
make sure that you have read and understood this material. Here, we merely 
restate the results. 
Let V be finite-dimensional vector space equipped with an inner product, 
and let A be a linear operator A: V — V on this space. Then 
I. Either 
i) Ax = b has a unique solution, 
or 
ii) Ax = 0 has a non-trivial solution. 


155 


156 CHAPTER 5. GREEN FUNCTIONS 


II. If Ax = 0 has n linearly independent solutions, then so does Atx = 0. 
III. If alternative ii) holds, then Az = b has no solution unless b is perpen- 
dicular to all solutions of Atx = 0. 
What is important for us in the present chapter is that this result continues 
to hold for linear differential operators L on a finite interval — provided that 
we define L' as in the previous chapter, and provided the number of boundary 
conditions is equal to the order of the equation. 

If the number of boundary conditions is not equal to the order of the 
equation then the number of solutions to Ly = 0 and L'y = 0 will differ in 
general. It is still true, however, that Ly = f has no solution unless f is 
perpendicular to all solutions of L'y = 0. 

Example: As an illustration of what happens when an equation with too 
many boundary conditions, consider 
dy 


La y(0) = y(1) = 0. (5.1) 


Clearly Ly = 0 has only the trivial solution y = 0. If a solution to Ly = f 
exists, therefore, it will be unique. 

We know that L' = —d/dz, with no boundary conditions on the functions 
in its domain. The equation L'y = 0 therefore has the non-trivial solution 
y = 1. This means that there should be no solution to Ly = f unless 


1 
(1, f) -| $a (5.2) 
0 
If this condition is satisfied then 
ula) = f° F(w) de (5.3) 
0 


satisfies both the differential equation and the boundary conditions at 7 = 
0,1. If the condition is not satisfied, y(x) is not a solution, because y(1) 4 0. 

Initially we only solve Ly = f for homogeneous boundary conditions. 
After we have understood how to do this, we will extend our methods to deal 
with differential equations with inhomogeneous boundary conditions. 


5.2 Constructing Green functions 


We will solve Ly = f, a differential equation with homogeneous boundary 
conditions, by finding an inverse operator L~', so that y = L°'f. This 
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inverse operator L~! will be represented by an integral kernel 
(he eg = GG, €); (5.4) 


with the property 


Here, the subscript 2 on L indicates that L acts on the first argument, x, of 
G. Then 


y(2) = if G(x, 8) f(€) aé (5.6) 


will obey 
pie / L,G(e, €)f() dé = / ile —8)f(O)de = f(a). (5.7) 


The problem is how to construct G(x, &). There are three necessary ingredi- 
ents: 
e the function y(xz) = G(x,&) must have some discontinuous behaviour 
at x = € in order to generate the delta function; 
e away from x = €, the function (x) must obey Ly = 0; 
e the function x(x) must obey the homogeneous boundary conditions 
required of y at the ends of the interval. 
The last ingredient ensures that the resulting solution, y(), obeys the bound- 
ary conditions. It also ensures that the range of the integral operator G lies 
within the domain of L, a prerequisite if the product LG = I is to make 
sense. The manner in which these ingredients are assembled to construct 
G(x, €) is best explained through examples. 


5.2.1 Sturm-Liouville equation 


We begin by constructing the solution to the equation 


(p(x)y")’ + a(x)y(@) = F(a) (5.8) 


on the finite interval [a, b] with homogeneous self-adjoint boundary conditions 


= tan, = tan Op. (5.9) 


y(a) y(b) 
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We therefore seek a function G(x, &) such that y(x) = G(x, &) obeys 
Lx = (px')' + ax = 6(@ — €), (5.10) 


The function (x) must also obey the homogeneous boundary conditions we 
require of y(z). 

Now (5.10) tells us that x(a) must be continuous at x = €. For if not, the 
two differentiations applied to a jump function would give us the derivative 
of a delta function, and we want only a plain d(x — €). If we write 


<x) = f Amon), 2 <& 
Glee) = x0) =| ApuOpnta), 2>€ — 


then x(x) is automatically continuous at x = €. We take y,(x) to bea 
solution of Ly = 0, chosen to satisfy the boundary condition at the left hand 
end of the interval. Similarly yr(x) should solve Ly = 0 and satisfy the 
boundary condition at the right hand end. With these choices we satisfy 
(5.10) at all points away from x = €. 

To work out how to satisfy the equation exactly at the location of the 
delta-function, we integrate (5.10) from € — « to € + and find that 


p(E)[x'(E +e) -—x'(E-e)] =1 (5.12) 
With our product form for x(x), this jump condition becomes 
Ap(é) (yx (€)un(€) — yi.(€yn(€)) =1 (5.13) 


and determines the constant A. We recognize the Wronskian W (yz, yr; &) 
on the left hand side of this equation. We therefore have A = 1/(pW) and 


aw yL(x)yr(€), u< €; 


aw yt(§)yr(2), x > €. (5.14) 


For the Sturm-Liouville equation the product pW is constant. This fact 
follows from Liouville’s formula, 


Wiz) =Wi0\ em - ia (2) acl . (5.15) 


and from p, = pp = p’ in the Sturm-Liouville equation. Thus 


W(x) =WO)exp(—mnp(e)/r]) =WOZ=. (5.16) 
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The constancy of pW means that G(x, &) is symmetric: 
G(z,) = Gg, 2). (5.17) 


This is as it should be. The inverse of a symmetric matrix (and the real, 
self-adjoint, Sturm-Liouville operator is the function-space analogue of a real 
symmetric matrix) is itself symmetric. 
The solution to 
Ly = (poy')’ + ay = f(x) (5.18) 


is therefore 


b x 
ute) = po {wote) f unlsyste) ae + vale) [ mle reag}. 6.9) 
Take care to understand the ranges of integration in this formula. In the 
first integral € > x and we use G(x, &) x yz(x)yr(§). In the second integral 
€ < ax and we use G(z,€) « yz()yr(z). It is easy to get these the wrong 
way round. 

Because we must divide by it in constructing G(x, &), it is necessary that 
the Wronskian W (yz, yr) not be zero. This is reasonable. If W were zero 
then yz « yr, and the single function yg satisfies both Lyr = 0 and the 
boundary conditions. This means that the differential operator L has yr as 
a zero-mode, so there can be no unique solution to Ly = f. 

Example: Solve 


—Oy = f(z), y(0) = y(1) =0. (5.20) 
We have 
ee i ‘ \ => YrYR — YLYR = 1. (5.21) 
We find that ‘ 
eS 
0 € 1 


Figure 5.1: The function x(x) = G(z,&€) . 
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and 
y(2) = (V2) 7 “Ef(Ode +2 | (Q-Ofdé. (5.23) 


5.2.2 Initial-value problems 


Initial value problems are those boundary-value problems where all boundary 
conditions are imposed at one end of the interval, instead of some conditions 
at one end and some at the other. The same ingredients go into to construct- 
ing the Green function, though. 

Consider the problem 


= — Q)y = F(t), y(0) = 0. (5.24) 


We seek a Green function such that 


L,G(t,t') = (5 - aw) Git, t') = 6(t - 0’) (5.25) 


and G(0, t’) = 0. 
We need y(t) = G(t,t’) to satisfy Livy = 0, except at t = t’, and need 


x(0) = 0. The unique solution of Lyy = 0 with y(0) = 0 is y(t) = 0. This 
means that G(t,0) = 0 for all t < t’. Near t = t’ we have the jump condition 


G(t' +e6,t') -G(t' —e,t’) =1. (5.26) 
The unique solution is 


G(t,t’) = 0(t — t') exp { 7 Q(syas} (5.27) 


where O(t — t’) is the Heaviside step distribution 


0, t<0O, 
O(t) = { 1 t>0. (5.28) 
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4 G(tt’) 


t’ 


Figure 5.2: The Green function G‘(t, t’') for the first-order initial value problem 


Therefore 


ut) =f oeeyrear, 
/ — { i “Q(s) is} F(t) dt! 
= ep { fa) is} ['on{- [a0 ash ne dt!. (5.29) 


We earlier obtained this solution via variation of parameters. 
Example: Forced, Damped, Harmonic Oscillator. An oscillator obeys the 
equation 


E+ 2Qy¢+ (0 +7)c = F(t). (5.30) 


Here y > 0 is the friction coeffecient. Assuming that the oscillator is at rest 
at the origin at t = 0, we will show that 


a(t) = (5) i; eV) sin O(t — 7) F (1) dr. (5.31) 


We seek a Green function G(t,7) such that x(t) = G(t,7) obeys (0) = 
x'(0) = 0. Again, the unique solution of the differential equation with this 
initial data is y(t) = 0. The Green function must be continuous at t = 7, 
but its derivative must be discontinuous there, jumping from zero to unity 
to provide the delta function. Thereafter, it must satisfy the homogeneous 
equation. The unique function satisfying all these requirements is 


Cea=te- rae ten OG. (5.32) 
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t Gii*) 


A _ 


Figure 5.3: The Green function G(t,7) for the damped oscillator problem . 


Both these initial-value Green functions G(t, t’) are identically zero when 
t < t’. This is because the Green function is the response of the system to a 
kick at time t = t’, and in physical problems no effect comes before its cause. 
Such Green functions are said to be causal. 


Physics application: friction without friction—the Caldeira-Leggett 
model in real time. 


We now describe an application of the initial-value problem Green function 
we found in the preceding example. 

When studying the quantum mechanics of systems with friction, such as 
the viscously damped oscillator, we need a tractable model of the dissipative 
process. Such a model was introduced by Caldeira and Leggett.! They 
consider the Lagrangian 


L= ; (Q?- (Q? — — AQ’)Q *) - QD. RED a a i Ww; eq.) ) (5.33) 


which describes a macroscopic variable Q(t), linearly coupled to an oscillator 
bath of very many simple systems q; representing the environment. The 


quantity 
def ie 
AQ? = — y (4) ; (5.34) 


4 7 


‘A. Caldiera, A. J. Leggett, Phys. Rev. Lett. 46 (1981) 211. 
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is a counter-term that is inserted cancel the frequency shift 


Ps -S° (4) (5.35) 


caused by the coupling to the bath.? 
The equations of motion are 


Q+ (VP -AM)Q+S° fia = 0, 


Gtwigt fiQ = 0. (5.36) 
Using our initial-value Green function, we solve for the q; in terms of Q(t): 
t fe 
a -{ (£) sinw,;(t — T)Q(r)dr. (5.37) 
co \ Hi 
The resulting motion of the q; feeds back into the equation for Q to give 
t 
Q + (2? — AN*)Q + i F(t —7)Q(r) dr =0, (5.38) 


where 


F(t) © - > (=) sin(w;t) (5.39) 


is a memory function. 
It is now convenient to introduce a spectral function 


10) # ES (#) sua), (5.40) 


Wi 


which characterizes the spectrum of couplings and frequencies associated 
with the oscillator bath. In terms of J(w) we can write 
2 


F(t) = = ie J(w) sin(wt) dw. (5.41) 


?The shift arises because a static Q displaces the bath oscillators so that fiq; = 
—(f?/w?)Q. Substituting these values for the fig; into the potential terms shows that, in 
the absence of AQ?Q?, the effective potential seen by Q would be 


2 
SO +QY. fia + uke = 5 (« ->. (4) @. 


7 
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Although J(w) is defined as a sum of delta function “spikes,” the oscillator 
bath contains a very large number of systems and this makes J(w) effectively 
a smooth function. This is just as the density of a gas (a sum of delta 
functions at the location of the atoms) is macroscopically smooth. By taking 
different forms for J(w) we can represent a wide range of environments. 
Caldeira and Leggett show that to obtain a friction force proportional to 
Q we should make J (w) proportional to the frequency w. To see how this 
works, consider the choice 


He) =n |]. (5.42) 


which is equal to nw for small w, but tends to zero when w >> A. The 

high-frequency cutoff A is introduced to make the integrals over w converge. 

With this cutoff 

2 io) 2 oe) A? dwt 

- | J(w) sin(wt) dw = — i ee 
0 


Dei Ne eg dw = sgn (t)n Are AA, (5.43) 


Therefore, 


[ re-namar = - f res atryar 
= —nAQ(t) + nQ(t) — sl) +--+, (6-44) 


where the second line results from expanding Q(T) as a Taylor series 


Q(T) = QE) + (r —HQE)+--, (5.45) 


and integrating term-by-term. Now, 


2, oo oo 2 
re cE 2 J(w) 2 | nA 
as = S a —_—— = = : A 
AQ = ( | i; dw ; 5) 5 dw nA (5 6) 


7 


The —AQ?Q counter-term thus cancels the leading term —nAQ(t) in (5.44), 
which would otherwise represent a A-dependent frequency shift. After this 
cancellation we can safely let A — oo, and so ignore terms with negative 
powers of the cutoff. The only surviving term in (5.44) is then 7Q. This 
we substitute into (5.38), which becomes the equation for viscously damped 
motion: 


Q+nQ+0?Q=0. (5.47) 
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The oscillators in the bath absorb energy but, unlike a pair of coupled oscil- 
lators which trade energy rhythmically back-and-forth, the incommensurate 
motion of the many q; prevents them from cooperating for long enough to 
return any energy to Q(t). 


5.2.3. Modified Green function 


When the equation Ly = 0 has a non trivial-solution, there can be no unique 
solution to Ly = f, but there still will be solutions provided f is orthogonal 
to all solutions of L'y = 0. 

Example: Consider 


Ly = —Ozy = f(z), y'(0) =y'() =0. (5.48) 


The equation Ly = 0 has one non-trivial solution, y(x) = 1. The operator 

L is self-adjoint, Lt‘ = L, and so there will be solutions to Ly = f provided 
1 

(Li =), pde= 0. 


We cannot define the the green function as a solution to 


—0?G(a, £) = 6(x — €), (5.49) 
because ie d(a — €)dx = 140, but we can seek a solution to 
-0G(x,t) = (4 —€)—1 (5.50) 
as the right-hand integrates to zero. 
A general solution to —0?y = —1 is 
y=A+Br+ a (5.51) 


and the functions 


1 
a= A+ 52°, 
1 5 
eS C- a+ 52", (5.52) 


obey the boundary conditions at the left and right ends of the interval, re- 
spectively. Continuity at « = € demands that A = C — €, and we are left 
with 

C-—€+42?, 0<aK<é 


er a ae be ped, (5.53) 
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There is no freedom left to impose the condition 
Gie=e2c)—-G E+ ec) = 1, (5.54) 
but it is automatically satisfied! Indeed, 


Gi(é-«,€) = € 
G'(€ +, €) —-1+€. (5.55) 


We may select a different value of C' for each €, and a convenient choice 
is 


1 1 
C=-24+- 5.56 
ae (5.56) 
which makes G symmetric: 
1 xL “+e 
a= es: , U<@e<e 
CC Aah ae : 5.57 
(7,8) th n+ oe bao | ee 
It also makes f Giz, 6) de= 
S 
Figure 5.4: The modified Green function. 
The solution to Ly = f is 
1 
oa) = ff Gla,e)F(@) ae + A, (5.58) 
0 


where A is arbitrary. 
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5.3 Applications of Lagrange’s identity 


5.3.1 Hermiticity of Green functions 


Earlier we noted the symmetry of the Green function for the Sturm-Liouville 
equation. We will now establish the corresponding result for general differ- 
ential operators. 

Let G(x, €) obey L,G(x,€) = 6(a — €) with homogeneous boundary con- 
ditions B, and let G'(x, €) obey LUG (x, €) = 6(x—€) with adjoint boundary 
conditions Bt. Then, from Lagrange’s identity, we have 


aaah [ e{(iie'e.9) ce.) - eoyzew.e)} 


[ee {oe - 9002.8 - (G9) dle 29} 
aee)- (4.8). 


(5.59) 


Thus, provided [Q(G, G")]° = 0, which is indeed the case because the bound- 
ary conditions for L, L' are mutually adjoint, we have 


a'(6,2) = (G@,8)) . (5.60) 


and the Green functions, regarded as matrices with continuous rows and 
columns, are Hermitian conjugates of one another. 
Example: Let 


be re D(L) = {y, Ly € L?(0, 1] : y(0) = 0}. (5.61) 


In this case G(x, €) = O(a — €). 


Now, we have 
D(L) = {y, Ly € L7[0, 1]: y(1) = 0} (5.62) 


and G' (a2, €) = 0(€ — 2). 
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0 . 1 0 ‘a 1 
Figure 5.5: G(a,€) = 0(x — €), and G'(z, €) = 0(€ — 2). 


5.3.2 Inhomogeneous boundary conditions 


Our differential operators have been defined with linear homogeneous bound- 
ary conditions. We can, however, use them, and their Green-function in- 
verses, to solve differential equations with inhomogeneous boundary condi- 
tions. 

Suppose, for example, we wish to solve 


y= f(z), y(0)=a, y(1)=b. (5.63) 


We already know the Green function for the homogeneous boundary-condition 
problem with operator 


L=—-6?, D(L) = {y, Ly € L’ [0,1] : y(0) = 0, y(1) = OF. (5.64) 
It is 
G(e,€) = i) a (5.65) 


v), 
Now we apply Lagrange’s identity to x(x) = G(x, &) and y(x) to get 


[ da {Ge g) (—ay()) — y(x) (-aG(e, 6))} = [G' (x, ©)y(x)-G(z, Oy'(x) |. 
Here, as usual, G'(a,€) = 0,G(a,€). The integral is equal to 6.66) 


/ dx {G(,£)f (0) — y(a)6(e —2)} = / Gl#,OF(2)de=y(O), 6.67) 
whilst the integrated-out bit is 


—(1 = §)y(0) — Oy"(0) — y(1) + Oy'(1). (5.68) 
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Therefore, we have 


y(é) = f Gle,8)fle) de + (1 8)y(0) + Eu). (5.69) 


Here the term with f(x) is the particular integral, whilst the remaining terms 
constitute the complementary function (obeying the differential equation 
without the source term) which serves to satisfy the boundary conditions. 
Observe that the arguments in G(x, €) are not in the usual order, but, in the 
present example, this does not matter because G' is symmetric. 

When the operator L is not self-adjoint, we need to distinguish between 
L and L', and G and G'. We then apply Lagrange’s identity to the unknown 
function u(x) and y(x) = G'(z, €). 
Example: We will use the Green-function method to solve the differential 
equation 


du 

a f(z), «x € (0,1), u(0) = a. (5.70) 
We can, of course, write down the answer to this problem directly, but it 
is interesting to see how the general strategy produces the solution. We 


first find the Green function G(x, €) for the operator with the corresponding 
homogeneous boundary conditions. In the present case, this operator is 


L=0,, D(L) = {u, Lu € L?(0, 1] : u(0) = 0}, (5.71) 


and the appropriate Green function is G(x,&) = 0(a — €). From G we then 


* 


read off the adjoint Green function as GT(x,€) = (Ge, x)) . In the present 


example, we have G(x,’ x) = 0(€ — x). We now use Lagrange’s identity in 
the form 


} ax { (LLG'(e, 8)" u(x) — (Ga, 8)" Lua) } = [Q(Gtu)]}. (6.72) 
In all cases, the left hand side is equal to 
/ dx {5(a — )u(xz) — G? (z, €)f()} , (5.73) 


where T' denotes transpose, G7 (x, €) = G(€, x). The left hand side is there- 
fore equal to 


ule) — f deca) f(a). (5.74) 
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The right hand side depends on the details of the problem. In the present 
case, the integrated out part is 


1 


[act u)], [G7 (a, €)u(x)], = u(0). (5.75) 


At the last step we have used the specific form G7 (x, €) = 6(€ — x) to find 
that only the lower limit contributes. The end result is therefore the expected 
one: 


y 
u(y) = u(0) + | f(x) da. (5.76) 
0 
Variations of this strategy enable us to solve any inhomogeneous boundary- 
value problem in terms of the Green function for the corresponding homoge- 
neous boundary-value problem. 


5.4 Eigenfunction expansions 


Self-adjoint operators possess a complete set of eigenfunctions, and we can 
expand the Green function in terms of these. Let 


Let us further suppose that none of the A, are zero. Then the Green function 
has the eigenfunction expansion 


G(x, €) = y ae) (5.78) 


That this is so follows from 


it (sagas) ye 


l| l| 
lor 
eeM.: 
Le 
can — 
coe 
6, 
oS 


(5.79) 
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Example: : Consider our familiar exemplar 


L=-d, D(L) = {y,Ly€ L7(0,1]:y(0)=y(1)=0}, (6.80) 
for which 
G(x, €) = ee = ; i : (5.81) 


Computing the Fourier series shows that 


G(x, €) = 2 VS :) sin(nma) sin(n7€). (5.82) 


Modified Green function 


When one or more of the eigenvalues is zero, a modified Green function is 
obtained by simply omitting the corresponding terms from the series. 


CMM CHS ese eal) (5.83) 
An 40 
Then 
Li Gugdlt 6) =e -S> Pn(x) yr (E (5.84) 
An =0 


We see that this Ginog is still hermitian, and, as a function of x, is orthogonal 
to the zero modes. These are the properties we elected when constructing 
the modified Green function in equation (5.57). 


5.5 Analytic properties of Green functions 


In this section we study the properties of Green functions considered as 
functions of a complex variable. Some of the formule. are slightly easier to 
derive using contour integral methods, but these are not necessary and we will 
not use them here. The only complex-variable prerequisite is a familiarity 
with complex arithmetic and, in particular, knowledge of how to take the 
logarithm and the square root of a complex number. 
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5.5.1 Causality implies analyticity 


Consider a Green function of the form G(t — 7) and possessing the causal 
property that G(t — rT) = 0, for t < 7. If the improper integral defining its 
Fourier transform, 


Gare / ” tae) at jim { / * tale) ae (5.85) 


converges for real w, it will converge even better when w has a positive 
imaginary part. Consequently G(w) will be a well-behaved function of the 
complex variable w everywhere in the upper half of the complex w plane. 
Indeed, it will be analytic there, meaning that its Taylor series expansion 
about any point actually converges to the function. For example, the Green 


function for the damped harmonic oscillator 


1 -4t g3 
_ Jae "sin(M), ¢>0, 5.86 
Gly) 1? t <0, 5:86) 
has Fourier transform 
~ 1 
G = 5.87 
) = goGSE (5.87) 


which is always finite in the upper half-plane, although it has pole singulari- 
ties at w = —7y + Q in the lower half-plane. _ 

The only way that the Fourier transform G of a causal Green function can 
have a pole singularity in the upper half-plane is if G contains a exponential 
factor growing in time, in which case the system is unstable to perturbations 
(and the real-frequency Fourier transform does not exist). This observation 
is at the heart of the Nyquist criterion for the stability of linear electronic 
devices. 

Inverting the Fourier transform, we have 


y 1 < Stl) 1 
G(t) = eee SO) sin (00); 5.88 
=f pare EH Ge sin). 6.88) 
It is perhaps surprising that this integral is identically zero if t < 0, and 
non-zero if t > 0. This is one of the places where contour integral methods 
might cast some light, but because we have confidence in the Fourier inversion 
formula, we know that it must be correct. 
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Remember that in deriving (5.88) we have explicitly assumed that the 
damping coefficient 7 is positive. It is important to realize that reversing the 
sign of 7 on the left-hand side of (5.88) does more than just change e~” — e” 
on the right-hand side. Naively setting ~ — —y on both sides of (5.88) gives 
an equation that cannot possibly be true. The left-hand side would be the 
Fourier transform of a smooth function, and the Riemann-Lebesgue lemma 
tells us that such a Fourier transform must become zero when |t| — oo. The 
right-hand side, to the contrary, would be a function whose oscillations grow 
without bound as t becomes large and positive. 

To find the correct equation, observe that we can legitimately effect the 
sign-change y — —vy by first complex-conjugating the integral and then 
changing t to —t. Performing these two operations on both sides of (5.88) 
leads to = i F i 

[. Powln = = —8(—t)ee™ sin (Qt) (5.89) 
The new right-hand side represents an exponentially growing oscillation that 
is suddenly silenced by the kick at t = 0. 


MANA, = MU, 
“VV VV VV? 


Ly= +E ly=-le 


Figure 5.6: The effect on G(t), the Green function of an undamped oscillator, 
of changing 1y from +i¢ to —ie. 


The effect of taking the damping parameter y from an infitesimally small 
postive value ¢ to an infinitesimally small negative value —e is therefore to 
turn the causal Green function (no motion before it is started by the delta- 
function kick) of the undamped oscillator into an anti-causal Green function 
(no motion after it is stopped by the kick). Ultimately, this is because the the 
differential operator corresponding to a harmonic oscillator with initial-value 
data is not self-adjoint, and its adjoint operator corresponds to a harmonic 
oscillator with final-value data. 
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This discontinuous dependence on an infinitesimal damping parameter is 
the subject of the next few sections. 


Physics application: Caldeira-Leggett in frequency space 


If we write the Caldeira-Leggett equations of motion (5.36) in Fourier fre- 
quency space by setting 


ay =f Faw" (5.90) 


and 2 
a) : 
qi(t) =) —q(wje™, (5.91) 


we have (after including an external force F.,4 to drive the system) 


(-w* + (0? = AN) QW) - d. fialw) = Foxr(w); 


(—w? + w7)ai(w) + fiQ(Ww) = 0. (5.92) 
Eliminating the q;, we obtain 
2 2 2 
(-w + (2 — AQ?) ))Qw ore 5 Qe) = Fox (w). (5.93) 
As before, sums over the index 7 are replaced by integrals over the spectral 
function 2 © Tou! 
w’ I(w ' 
air: we = fs - | irae dws 5 (5.94) 
and p 2 Tw" 
Pe i W / 
== d. (4) = - | sel (5.95) 
Then i 
= (SS | Ee); 5.96 
Qe) = (gee) Fol (5.96) 


where the self-energy H(w) is given by 
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The expression 
1 


OQ? — w? + Tw) 
a typical response function. Analogous objects occur in all branches of 
physics. 

For viscous damping we know that J(w) = nw. Let us evaluate the 
integral occuring in II(w) for this case: 


Gw) = (5.98) 


I(w) = [ —— (5.99) 


—W 


We will initially assume that w is positive. Now, 


=): (5.100) 


1 1 
wl —w2 Ww —- 
so 


T(w) = 5 (int ~w) —In(w! +w) Ve (5.101) 


At the upper limit we have In (co —w)/(oco+ w)) =Inl =0. The lower 


limit contributes i 
—=- (In(-w) = In(w)), (5.102) 


To evaluate the logarithm of a negative quantity we must use 
Inw = In|w|+iargw, (5.103) 


where we will take argw to lie in the range —7 < argw < 7. 


4 In @ 
(0) 
~ Rew 
is a 
arg (—@) 


Figure 5.7: When w has a small positive imaginary part, arg (—w) ® —1. 
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To get an unambiguous answer, we need to give w an infinitesimal imaginary 
part +ze. Depending on the sign of this imaginary part, we find that 


vs 
I(w tie) = +#—. 5.104 
(wie) = 2 (5.104) 


This formula remains true when the real part of w is negative, and so 
I(w + ie) = Finw. (5.105) 


Now the frequency-space version of 


Q(t) +nQ + VQ = Fex(t) (5.106) 
is 

(—w? — inw + 97)Q(w) = Foa(w), (5.107) 
so we must opt for the small shift in w that leads to II(w) = —inw. This 
means that we must regard w as having a positive infinitesimal imaginary 
part, w — w+ze. This imaginary part is a good and needful thing: it effects 
the replacement of the ill-defined singular integrals 


G(t) = if Peggy, (5.108) 
0 


w? —w? 


which arise as we transform back to real time, with the unambiguous expres- 


sions 
1 


G.(t) = | 2G dw. (5.109) 
0 a 


The latter, we know, give rise to properly causal real-time Green functions. 


5.5.2 Plemelj formulz 


The functions we are meeting can all be cast in the form 


f(w) = | BAD ay (5.110) 


1 w! — WwW 


If w lies in the integration range |a, b], then we divide by zero as we integrate 
over w’ = w. We ought to avoid doing this, but this interval is often exactly 
where we desire to evaluate f. As before, we evade the division by zero by 


5.5. ANALYTIC PROPERTIES OF GREEN FUNCTIONS Wer 


giving w an infintesimally small imaginary part: w — w+ie. We can then 
apply the Plemelj formula, named for the Slovenian mathematician Josip 
Plemelj, which say that 


5( flu + te) — fw —ée)) = ip(w), 


5(flw +i) + fw —%)) 7 =P { Oe a ea 


2 1 Ww! —Ww 


As explained in section 2.3.2, the “P” in front of the integral stands for 
principal part. Recall that it means that we are to delete an infinitesimal 
segment of the w’ integral lying symmetrically about the singular point w’ = 
Ww. 


v v + Imo 


Figure 5.8: The analytic function f(w) is discontinuous across the real axis 
between a and b. 


The Plemelj formula mean that the otherwise smooth and analytic func- 
tion f(w) is discontinuous across the real axis between a and b. If the dis- 
continuity p(w) is itself an analytic function then the line joining the points 
a and 0 is a branch cut, and the endpoints of the integral are branch-point 
singularities of f(w). 

The reason for the discontinuity may be understood by considering figure 
5.9. The singular integrand is a product of p(w’) with 


1 w’ —wW 1E 
= + — ——. 5.112 
wi—(wtie) (w—w)?+e2 (uw! —w)? +? ( ) 


The first term on the right is a symmetrically cut-off version 1/(w’ — w) and 
provides the principal part integral. The the second term sharpens and tends 
to the delta function +i76(w’ — w) as « — 0, and so gives +imp(w). Because 
of this explanation, the Plemelj equations are commonly encoded in physics 
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papers via the “ie” cabbala 


5= P( : ) + in5(w! —w). (5.113) 


w! — (w tie w! —w 


iReg {Img 


y 


Figure 5.9: Sketch of the real and imaginary parts of g(w’) = 1/(w’—(w-+ie)). 


If p is real, as it often is, then f(w+in) = (fin) . The discontinuity 
across the real axis is then purely imaginary, and 


iL 
5 (fw + ie) + fw —se)) (5.114) 
is the real real part of f. In this case we can write (5.110) as 
eae ge : 
Re f(w) = =P | aC (5.115) 
mT J, wh—-w 


This formula is typical of the relations linking the real and imaginary parts 
of causal response functions. 

A practical example of such a relation is provided by the complex, frequency- 
dependent, refractive index, n(w), of a medium. This is defined so that a 
travelling electromagnetic wave takes the form 


Lake (5.116) 


Here, k = w/c is the in vacuuo wavenumber. We can decompose n into its 
real and imaginary parts: 


n(w) = nrtiny 


np(w) + a) (5.117) 
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where ¥ is the extinction coefficient, defined so that the intensity falls off 
as I = Ip exp(—yax). A non-zero y can arise from either energy absorbtion 
or scattering out of the forward direction. For the refractive index, the 
function f(w) = n(w) — 1 can be written in the form of (5.110), and, using 
n(—w) = n*(w), this leads to the Kramers-Kronig relation 


Ceo 0). a 
Formulze like this will be rigorously derived in chapter 18 by the use of 
contour-integral methods. 


5.5.3 Resolvent operator 


Given a differential operator L, we define the resolvent operator to be Ry = 
(LZ —\I)~. The resolvent is an analytic function of A, except when \ lies in 
the spectrum of L. 
We expand R), in terms of the eigenfunctions as 
Ryley = Pnlx)Pn(§) (5.119) 
—~ An — A 
When the spectrum is discrete, the resolvent has poles at the eigenvalues 


L. When the operator DL has a continuous spectrum, the sum becomes an 
integral: 


rn(ose) =f ony a, (5.120) 


where p(y) is the eigenvalue density of states. This is of the form that 

we saw in connection with the Plemelj formulae. Consequently, when the 

spectrum comprises segements of the real axis, the resulting analytic function 

Ry will be discontinuous across the real axis within them. The endpoints 

of the segements will branch point singularities of R), and the segements 

themselves, considered as subsets of the complex plane, are the branch cuts. 
The trace of the resolvent Tr R, is defined by 


fe {R (ax, x)} 
[unfree } 


Tr Ry 


180 CHAPTER 5. GREEN FUNCTIONS 
7 3 1 
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[4 dt. (5.121) 
p—r 


Applying Plemelj to R), we have 


Im flim {Tr Rysic | = 7p(A). (5.122) 


E— 


Here, we have used that fact that p is real, so 
Tr Ry ie = (Tr Ryvic) (5.123) 


The non-zero imaginary part therefore shows that Ry is discontinuous across 
the real axis at points lying in the continuous spectrum. 
Example: Consider 


L=-@2+m?, D(L) = {y, Ly € L*[-00, oo}}. (5.124) 


As we know, this operator has a continuous spectrum, with eigenfunctions 


pp = —ae’**. (5.125) 
VL 


Here, L is the (very large) length of the interval. The eigenvalues are FE = 
k? + m?, so the spectrum is all positive numbers greater than m?. The 
momentum density of states is 


p(k) = —. (5.126) 


The completeness relation is 
/ aR ike) = d(x — &), (5.127) 
5g 


which is just the Fourier integral formula for the delta function. 
The Green function for L is 


_ ff? a (ar velz)yety) _ ia Le ili en a 
oe-v)= f a(S) eee = i ee 
(5.128) 


(oe) 
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—, — arg(—A)/2 
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Figure 5.10: IfIm. > 0, and with the branch cut for \/z in its usual place 
along the negative real axis, then —A has negative imaginary part and 
positive real part. 


We can use the same calculation to look at the resolvent Ry = (—02 — A)71. 


Replacing m? by —A, we have 


1 
Ry(a, y) = ——e7V*#-4, 5.129 
en = (5.129) 
To appreciate this expression, we need to know how to evaluate \/z where 
z is complex. We write z = |z/e’® where we require —7t < ¢ < 7. We now 


define . 
Vz = V|zle*”. (5.130) 


When we evaluate \/z for z just below the negative real axis then this defini- 
tion gives —i Jz , and just above the axis we find +i API . The discontinuity 
means that the negative real axis is a branch cut for the the square-root func- 
tion. The —,’s appearing in Ry therefore mean that the positive real axis 
will be a branch cut for R),. This branch cut therefore coincides with the 
spectrum of LZ, as promised earlier. 

If \ is positive and we shift 4 — A + ze then 


1 e-Vsle-al _, a etivAla—yl—el2—yl/2VX_ (5.131) 


27>) 2/r 


Notice that this decays away as |x — y| — oo. The square root retains a 
positive real part when \ is shifted to \—7e, and so the decay is still present: 


il a 
-—V—A\r—-y| iV Na—y|—ela—y|/2Vr 
———e£ + -— =e : 5.132 
2X avi ae 
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In each case, with » either immediately above or immediately below the 
cut, the small imaginary part tempers the oscillatory behaviour of the Green 
function so that y(x) = G(x, y) is square integrable and remains an element 
of L?(R]. 

We now take the trace of R by setting « = y and integrating: 


L 
Tr Rye = 179 ——. 5.133 
— om DM ( ) 
Thus, 
(4) = 6(4) (5.134) 
‘i ae | 


which coincides with our direct calculation. 
Example: Let 


L=-i0,, D(L) = {y, Ly € L?|R}}. (5.135) 
This has eigenfunctions e*** with eigenvalues k. The spectrum is therefore 
the entire real line. The local eigenvalue density of states is 1/27. The 


resolvent is therefore 


eae ee 
es i@—€)__* dh. 5.136 
a k=) e188) 


(10, = Ne 


To evaluate this, first consider the Fourier transforms of 


Ra = Ane”, 
Fi(z) = —6(—x)e™, (5.137) 


where & is a positive real number. 


Figure 5.11: The functions F\(x%) = 0(2)e""* and Fo(x) = —0(—x)e™ . 
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We have 
—K2 tka th ee 
J {o(ere be dx = eae (5.138) 
_A(_ Ka@ tka nee 
I { 0(—ax)e \e dx Fa aera (5.139) 


Inverting the transforms gives 


Kx i = 1 tka 
dlz)e Or fo baa 
d(—z)e™ = : / ae e!®* dk (5.140) 
Te OR oo KHIK 


These are important formule in their own right, and you should take care 
to understand them. Now we apply them to evaluating the integral defining 
Ry. 

If we write A = yp + wv, we find 

UG age ee ae iO(x — E)e@—He-v@-2), ay > 0, 

I Js a = 7 ae =z Seve) p<, ey 
In each case, the resolvent is x e”” away from €, and has jump of +7 at 
x = € so as produce the delta function. It decays either to the right or to 
the left, depending on the sign of v. The Heaviside factor ensures that it is 
multiplied by zero on the exponentially growing side of e~””, so as to satisfy 
the requirement of square integrability. 

Taking the trace of this resolvent is a little problematic. We are to set 7 = 
€ and integrate — but what value do we associate with 6(0)? Remembering 
that Fourier transforms always give to the mean of the two values at a jump 
discontinuity, it seems reasonable to set 6(0) = z. With this definition, we 


have 
<L, Im A > 0, 
Tr Ry = (5.142) 
=5h, Im A < 0. 


Our choice is therefore compatible with Tr Ry\ii2 = to = L/27. We have 
been lucky. The ambiguous expression 0(0) is not always safely evaluated as 
1/2. 
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5.6 Locality and the Gelfand-Dikii equation 


The answers to many quantum physics problems can be expressed either as 
sums over wavefunctions or as expressions involving Green functions. One 
of the advantages of writing the answer in terms of Green functions is that 
these typically depend only on the local properties of the differential operator 
whose inverse they are. This locality is in contrast to the individual wave- 
functions and their eigenvalues, both of which are sensitive to the distant 
boundaries. Since physics is usually local, it follows that the Green function 
provides a more efficient route to the answer. 

By the Green function being local we mean that its value for x,&€ near 
some point can be computed in terms of the coefficients in the differential 
operator evaluated near this point. To illustrate this claim, consider the 
Green function G(zx,€) for the Schrédinger operator —0? + q(x) + A on the 
entire real line. We will show that there is a not exactly obvious (but easy 
to obtain once you know the trick) local gradient expansion for the diagonal 
elements D(x) = G(a,x). These elements are often all that is needed in 
physics. We begin by recalling that we can write 


G(x, §) x u(w)o(g) 


where u(a), v(x) are solutions of (—0? + q(x) + A)y = 0 satisfying suitable 
boundary conditions to the right and left respectively. We set D(x) = G(z, x) 
and differentiate three times with respect to x. We find 


BDz) = uy 4 3u"v! + 3u’v" + u® 
= (0,(q¢+A)u)v + 3(q + A)Oz(uv) + (Or(q + A)v) u. 


Here, in passing from the first to second line, we have used the differential 
equation obeyed by wu and v. We can re-express the second line as 


Core 503)D(a) = 9a): (5.143) 


This relation is known as the Gelfand-Dikii equation. Using it we can find 
an expansion for the diagonal element D() in terms of q and its derivatives. 
We begin by observing that for q(x) = 0 we know that D(x) = 1/(2VA). We 
therefore conjecture that we can expand 


1 fb) gna) 
Dia) = (2 ae + oye t + (-1) + i 
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If we insert this expansion into (5.143) we see that we get the recurrence 
relation 


1 


We can therefore find 6,1; from 6, by differentiation followed by a single 
integration. Remarkably, 0,b,+1 is always the exact derivative of a polynomal 
in q and its derivatives. Further, the integration constants must be be zero 
so that we recover the gq = 0 result. If we carry out this process, we find 


bi(x) = q(x), 


ta 


2 2 

ie ee ae ee), 

5a SU aoa ea) a 
pune , Tax) pales 7 ao ais) 


and so on. (Note how the terms in the expansion are graded: Each b, 
is homogeneous in powers of qg and its derivatives, provided we count two 
x derivatives as being worth one q(x).) Keeping a few terms in this series 
expansion can provide an effective approximation for G(x, x), but, in general, 
the series is not convergent, being only an asymptotic expansion for D(x). 
A similar strategy produces expansions for the diagonal element of the 
Green function of other one-dimensional differential operators. Such gradient 
expansions also exist in in higher dimensions but the higher-dimensional 
Seeley-coefficient functions are not as easy to compute. Gradient expansions 
for the off-diagonal elements also exist, but, again, they are harder to obtain. 


5.7 Further exercises and problems 


Here are some further exercises that are intended to illustrate the material 
of this chapter: 


Exercise 5.1: Fredholm Alternative. A heavy elastic bar with uniform mass 
m per unit length lies almost horizontally. It is supported by a distribution of 
upward forces F(x). 
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Figure 5.12: Elastic bar 


The shape of the bar, y(x), can be found by minimizing the energy 


Uly] = [ {Satu"? — (F(a) - mau div. 


e Show that this minimization leads to the equation 


pe = ben y =y" =0 at 2=0,L 
dx* ' — 

e Show that the boundary conditions are such that the operator L is self- 
adjoint with respect to an inner product with weight function 1. 

e Find the zero modes which span the null space of L. 

e If there are n linearly independent zero modes, then the codimension of 
the range of L is also n. Using your explicit solutions from the previous 
part, find the conditions that must be obeyed by F' (x) for a solution of 
Ly = F—mgq to exist. What is the physical meaning of these conditions? 

e The solution to the equation and boundary conditions is not unique. Is 
this non-uniqueness physically reasonable? Explain. 


Exercise 5.2: Flexible rod again. A flexible rod is supported near its ends by 
means of knife edges that constrain its position, but not its slope or curvature. 
It is acted on by by a force F(z). 


The deflection of the rod is found by solving the the boundary value problem 


4 
#4 _ pa), y(0)=v)=0, v"0) =v") =0. 


We wish to find the Green function G(x, €) that facilitates the solution of this 
problem. 
a) If the differential operator and domain (boundary conditions) above is 
denoted by L, what is the operator and domain for Lt? Is the problem 
self-adjoint? 


5.7. FURTHER EXERCISES AND PROBLEMS 187 
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F(x) x=1 
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/\ x 
<= 
Figure 5.13: Simply supported rod. 


b) Are there any zero-modes? Does F have to satisfy any conditions for the 
solution to exist? 

c) Write down the conditions, if any, obeyed by G(a,&) and its derivatives 
8,G(a,&), 62,G(a,€), 02,,G(a,€) at 7 =0, 2 =€, and z =1. 

d) Using the conditions above, find G(x#,&). (This requires some boring 
algebra — but if you start from the “jump condition” and work down, 
it can be completed in under a page) 

e) Is your Green function symmetric (G(z,2) = G(€,x))? Is this in ac- 
cord with the self-adjointness or not of the problem? (You can use this 
property as a check of your algebra.) 

f) Write down the integral giving the general solution of the boundary value 
problem. Assume, if necessary, that F(x) is in the range of the differential 
operator. Differentiate your answer and see if it does indeed satisfy the 
differential equation and boundary conditions. 


Exercise 5.3: Hot ring. The equation governing the steady state heat flow on 
thin ring of unit circumference is 


—y" =f, O<2<1, y(0)=y(1), y'(0)=y'(1). 


a) This problem has a zero mode. Find the zero mode and the consequent 
condition on f(x) for a solution to exist. 
b) Verify that a suitable modified Green function for the problem is 


g(a, €) = 5 (x — 8)? — Sle —€| 


You will need to verify that g(x, &) satisfies both the differential equation 
and the boundary conditions. 
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Exercise 5.4: By using the observation that the left hand side is 27 times the 
eigenfunction expansion of a modified Green function G(x, 0) for L = —02 on 
a circle of unit radius, show that 


inn 2 


e€ 1 9 7 
S> 5) = 5(@-7) ~ §? x € (0,27). 


nN=— CO 


The term with n = 0 is to be omitted from the sum. 


Exercise 5.5: Seek a solution to the equation 


d2 
S53 =F (0), LE (0, 1] 


with inhomogeneous boundary conditions y/(0) = Fo, y/(1) = Fi. Observe 
that the corresponding homogeneous boundary condition problem has a zero 
mode. Therefore the solution, if one exists, cannot be unique. 


a) Show that there can be no solution to the differential equation and in- 
homogeneous boundary condition unless f(x) satisfies the condition 


1 
i: f(x) dx = Fo — F,. (x) 
0 
b) Let G(x,&) denote the modified Green function (5.57) 


oy eee O0<a<€ 
ag pee Eon <1, 


di 
ats) = {4 
3 


Use the Lagrange-identity method for inhomogeneous boundary condi- 
tions to deduce that if a solution exists then it necessarily obeys 


1 1 
y(2) = / y(é) dé + i Glé,«)f (© dé + G(1,2)Fi — G(0, 2) Fh. 


c) By differentiating with respect to x, show that 


1 
eer ce / G(é,2)f(€) dé + G(1,2)Fi — G0, 2) +C, 


where C is an arbitrary constant, obeys the boundary conditions. 

d) By differentiating a second time with respect to x, show that Ytentative(2) 
is a solution of the differential equation if, and only if, the condition x is 
satisfied. 
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Exercise 5.6: Lattice Green Functions . The k x k matrices 


2 -l O 0 O  ... 0 2 -l1 0O 0 0 

-l 2 -1 0O O ... O -l 2 -1 0O 0 

0 -l 2 -1 0.... 0O 0 -l 2 -1 0 
Pes . Be. ; , . T= . ne oe , ; 

0 0 -l 2 -1 0 0 0 -l 2 -!1 

O ... O 0 -l 2 -!1 O ... O 0 -l 2 

O ... O 0 0 -1l 2 O ... O 0 0 -l 


represent two discrete lattice approximations to —0? on a finite interval. 


a) What are the boundary conditions defining the domains of the corre- 
sponding continuum differential operators? [They are either Dirichlet 
(y = 0) or Neumann (y’ = 0) boundary conditions.] Make sure you 
explain your reasoning. 

b) Verify that 

_ S fieinte aj 
Tey = mine) 
1 Jij ( J ) k ah 1? 
(Ts ly =: ming): 

c) Find the continuum Green functions for the boundary value problems 
approximated by the matrix operators. Compare each of the matrix 
inverses with its corresponding continuum Green function. Are they 
similar? 


Exercise 5.7: Eigenfunction expansion The resolvent (Green function) R)(x,€) = 
(L- ze can be expanded as 
An 


where y,,(x) is the normalized eigenfunction corresponding to the eigenvalue 
An. The resolvent therefore has a pole whenever \ approaches ,,.. Consider 
the case 


with boundary conditions y(0) = y(L) = 0. 
a) Show that 


1 
Rie ec) = pen mwe sinw(L — &), Ue. 


1 
= —=sinw(L—2)sinwg, <2 
wsin wD 
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b) Confirm that R,,2 becomes singular at exactly those values of w? corre- 
sponding to eigenvalues w? of a 
c) Find the associated eigenfunctions y,(2) and, by taking the limit of 


R,,2 as w? — w?, confirm that the residue of the pole (the coefficient of 


1/(w2 — w?)) is precisely the product of the normalized eigenfunctions 


Pn(#) pn (6). 
Exercise 5.8: In this exercise we will investigate the self adjointness of the 
operator T = —i0/Ox on the interval [a,b] by using the resolvent operator 


Rete. > 
a) The integral kernel R)(x,€) is a Green function obeying 
ae 
sls A} Ry(x,€) = d(x — €). 
ie 
Use standard methods to show that 
1 : 
Ry(0.8) = 5 ( Ka + ise (e —) eM, 


where K) is a number that depends on the boundary conditions imposed 
at the endpoints a, b, of the interval. 

b) If T is to be self-adjoint then the Green function must be Hermitian, i.e. 
Ry(a.€) = [Ry (€,x)]*. Find the condition on K) for this to be true, and 
show that it implies that 


Ry (b, f) = PALA 
Ry (a, f) 
where 9) is some real angle. Deduce that the range of Ry is the set of 


functions 
Dy = {y(a) : y(b) = ey(a)f. 

Now the range of Ry is the domain of (T — AI), which should be same 
as the domain of T and therefore not depend on ». We therefore require 
that 0, not depend on A». Deduce that T will be self-adjoint only for 
boundary conditions y(b) = ey(a) — i.e. for twisted periodic boundary 
conditions. 

c) Show that with the twisted periodic boundary conditions of part b), we 


have 
A(b — a) — 0 
i; Saal 


From this, show that R)(x,€) has simple poles at A = A, where \,, are 
the eigenvalues of T. 


Ky =~ cot ( 


5.7. FURTHER EXERCISES AND PROBLEMS 191 


d) Compute the residue of the pole of R)(x,€) at the eigenvalue ,,, and 
confirm that it is a product of the corresponding normalized eigenfunc- 
tions. 


Problem 5.9: Consider the one-dimensional Dirac Hamiltonian 


it = ( —10,, a). 


mi + ime +10, 


= —1630, + m4(x)G, + mo(x)oo. 


Here m (x), m2(x) are real functions, and the o; are the Pauli matrices. H 
acts on a two-component “spinor” 


ve)= (Si) 


Impose self-adjoint boundary conditions 


Yi(@) _ ¥1(0) 
(a) 2(0) 


at the ends of the interval [a,b]. Let U,(x) be a solution of HW = XW obey- 
ing the boundary condition at « = a, and VRr(x) be a solution obeying the 
boundary condition at « = b. Define the “Wronskian” of these solutions to be 


xp{iBa}, = exp{i} 


W(x, UR) = VLG3UR. 


a) Show that, for real \ and the given boundary conditions, the Wronskian 
W(W,, UR) is independent of position. Show also that W(W,,U,) = 
W(Wp, Up) =0. : 

b) Show that the matrix-valued Green function G(x, €) obeying 


(H — A1)G(a, €) = 16(x — €), 


and the given boundary conditions has entries 


a 
—= 7 YL,a(2) YR, 6(§), DS g, 
Gag, E) — . 
+ inal@is@, 2>& 


Observe that Gag(x,€) = G5, (2), as befits the inverse of a self-adjoint 
operator. 
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c) The Green function is discontinuous at « = €, but we can define a 
“position-diagonal” part by the taking the average 


d 


Cale) 8 (Tumale)¥i.s(e) ~ Gretnaleiste)). 


Show that if we define the matrix g(x) by setting g(x) = G(x)63, then 
tr g(x) = 0 and g?(x) = —4J. Show further that 


1029 = (9, kK), (x) 
where K(x) = 63 (AI — m4(x)61 — mo(zx)G2). 


The equation (x) obtained in part (c) is the analogue of the Gelfand-Dikii 
equation for the Dirac Hamiltonian. It has applications in the theory of su- 
perconductivity, where (x) is known as the Eilenberger equation. 


Chapter 6 


Partial Differential Equations 


Most differential equations of physics involve quantities depending on both 
space and time. Inevitably they involve partial derivatives, and so are par- 
tial differential equations (PDE’s). Although PDE’s are inherently more 
complicated that ODE’s, many of the ideas from the previous chapters — in 
particular the notion of self adjointness and the resulting completeness of the 
eigenfunctions — carry over to the partial differential operators that occur 
in these equations. 


6.1 Classification of PDE’s 


We focus on second-order equations in two variables, such as the wave equa- 
tion 
Op 18 
Oa? 2 OH 
Laplace or Poisson’s equation 
rp Wy 
da?” dy? 


or Fourier’s heat equation 


= (ais (Hyperbolic) (6.1) 


= Fila a): (Elliptic) (6.2) 


—, —-k— = f(z,t). (Parabolic) (6.3) 


What do the names hyperbolic, elliptic and parabolic mean? In high- 
school co-ordinate geometry we learned that a real quadratic curve 


ax? + Qbry + cy? + fe+gyth=0 (6.4) 
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represents a hyperbola, an ellipse or a parabola depending on whether the 
discriminant, ac — b?, is less than zero, greater than zero, or equal to zero, 
these being the conditions for the matrix 


fe (6.5) 


to have signature (+, —), (+, +) or (+,0). 
By analogy, the equation 


2 2 2 


) a) a) 
a(x, ss + 20(z, Deby + c(2, N53 + (lower orders) = 0, (6.6) 


is said to be hyperbolic, elliptic, or parabolic at a point (2, y) if 


a(x ’ y) D(x ’ y) 2 
b(xz,y)  e(x,y) | = (ae lew oO 
is less than, greater than, or equal to zero, respectively. This classification 
helps us understand what sort of initial or boundary data we need to specify 
the problem. 

There are three broad classes of boundary conditions: 

a) Dirichlet boundary conditions: The value of the dependent vari- 
able is specified on the boundary. 
b) Neumann boundary conditions: The normal derivative of the de- 
pendent variable is specified on the boundary. 
c) Cauchy boundary conditions: Both the value and the normal deriva- 
tive of the dependent variable are specified on the boundary. 
Less commonly met are Robin boundary conditions, where the value of a 
linear combination of the dependent variable and the normal derivative of 
the dependent variable is specified on the boundary. 

Cauchy boundary conditions are analogous to the initial conditions for a 
second-order ordinary differential equation. These are given at one end of 
the interval only. The other two classes of boundary condition are higher- 
dimensional analogues of the conditions we impose on an ODE at both ends 
of the interval. 

Each class of PDE’s requires a different class of boundary conditions in 
order to have a unique, stable solution. 

1) Elliptic equations require either Dirichlet or Neumann boundary con- 
ditions on a closed boundary surrounding the region of interest. Other 
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boundary conditions are either insufficient to determine a unique solu- 
tion, overly restrictive, or lead to instabilities. 

2) Hyperbolic equations require Cauchy boundary conditions on a open 
surface. Other boundary conditions are either too restrictive for a 
solution to exist, or insufficient to determine a unique solution. 

3) Parabolic equations require Dirichlet or Neumann boundary condi- 
tions on a open surface. Other boundary conditions are too restrictive. 


6.2 Cauchy data 
Given a second-order ordinary differential equation 


poy" + pry’ + poy = f (6.8) 


with initial data y(a), y'(a) we can construct the solution incrementally. We 
take a step 6x = € and use the initial slope to find y(a + €) = y(a) + ey’(a). 
Next we find y’(a) from the differential equation 


y"(a) = —< (pa + py(a) — f(a), (6.9) 


and use it to obtain y/(a + €) = y'(a) + ey”(a). We now have initial data, 
y(ate), y'/(a+e), at the point a+e, and can play the same game to proceed 
to a+ 2¢, and onwards. 


Figure 6.1: The surface [ on which we are given Cauchy Data. 


Suppose now that we have the analogous situation of a second order 
partial differential equation 
Oy 


Gw(2) > ape + (lower orders) = 0. (6.10) 
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in R”. We are also given initial data on a surface, [’, of co-dimension one in 
Re 

At each point p on I’ we erect a basis n, t,, t2,...,t,—1, consisting of the 
normal to [ and n— 1 tangent vectors. The information we have been given 
consists of the value of y at every point p together with 


OP det yp OP 
On Ort’ 


the normal derivative of y at p. We want to know if this Cauchy data 
is sufficient to find the second derivative in the normal direction, and so 
construct similar Cauchy data on the adjacent surface [+ en. If so, we can 
repeat the process and systematically propagate the solution forward through 
R”. 

From the given data, we can construct 


(6.11) 


2 2 
Op def nt? Ory . 

Onot; Oxl Ox” 

OPO age O2—p 
Se) ee 6.12 
Ot, Ot; oe Ort Ox” ( ) 


but we do not yet have enough information to determine 


OP det uv OP 
Ondn Ox# Ox” 


Can we fill the data gap by using the differential equation (6.10)? Suppose 
that 


(6.13) 


Oy 
OTP OR" 
where ¢f” is a guess that is consistent with (6.12), and ® is as yet unknown, 
and, because of the factor of nn”, does not affect the derivatives (6.12). We 
plug into 


= $5" +n'n’® (6.14) 


Q 
(0) ea + (known lower orders) = 0. (6.15) 

and get 
dyyn''n’® + (known) = 0. (6.16) 


We can therefore find ® provided that 


Gist hoe 0: (6.17) 
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If this expression is zero, we are stuck. It is like having po(x) = 0 in an 
ordinary differential equation. On the other hand, knowing © tells us the 
second normal derivative, and we can proceed to the adjacent surface where 
we play the same game once more. 

Definition: A characteristic surface is a surface X such that a,,n'n” = 0 
at all points on %. We can therefore propagate our data forward, provided 
that the initial-data surface [ is nowhere tangent to a characteristic surface. 
In two dimensions the characteristic surfaces become one-dimensional curves. 
An equation in two dimensions is hyperbolic, parabolic, or elliptic at at a 
point (x, y) if it has two, one or zero characteristic curves through that point, 
respectively. 

Characteristics are both a curse and blessing. They are a barrier to 
Cauchy data, but, as we see in the next two subsections, they are also the 
curves along which information is transmitted. 


6.2.1 Characteristics and first-order equations 


Suppose we have a linear first-order partial differential equation 
O O 
a(e,u) z+ We.v)z + ele, y)u= feu). (6.18) 


We can write this in vector notation as (v- V)u+cu = f, where v is the 
vector field v = (a,b). If we define the flow of the vector field to be the 
family of parametrized curves x(t), y(t) satisfying 


dx dy 
= ep 6.19 
tw) ae = Hey), (6.19) 
then the partial differential equation (6.18) reduces to an ordinary linear 


differential equation 
du 


—, + e(éult) = £0) (6.20) 


along each flow line. Here, 


io) 
oN 
+ 
ete 
ie) 
— 
8 
he eS 
See 
ce Se 
+ 
Nir, 
ae 


(t)). (6.21) 
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t y 


——~ _ bad! 


~ XxX 


Figure 6.2: Initial data curve T, and flow-line characteristics. 


Provided that a(x, y) and b(x, y) are never simultaneously zero, there will be 


one flow-line curve passing through each point in 


R?. If we have been given 


the initial value of u on a curve I’ that is nowhere tangent to any of these flow 
lines then we can propagate this data forward along the flow by solving (6.20). 
On the other hand, if the curve I does become tangent to one of the flow 
lines at some point then the data will generally be inconsistent with (6.18) 
at that point, and no solution can exist. The flow lines therefore play a role 
analagous to the characteristics of a second-order partial differential equation, 
and are therefore also called characteristics. The trick of reducing the partial 
differential equation to a collection of ordinary differential equations along 
each of its flow lines is called the method of characteristics. 


Exercise 6.1: Show that the general solution to the equation 


y(z,y) =e “*flr+y), 


where f is an arbitrary function. 


6.2.2 Second-order hyperbolic equations 


Consider a second-order equation containing the operator 


2 2 


O O 


Ox? OxOy 


2 


Ce, Nap (6:22) 
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We can always factorize 
aX? +2bXY +cY* = (aX + BY)(yX + OY), (6.23) 


and from this obtain 


O° igpee aa = 2. eg ae ce + lower 

"Ox? Oxdy “Oy? ~ (ae Oy "Oe Oy : 
0 0 0 0 

sey ee en eer a eed 

Ge -) (a2 ar a5) poe 


(6.24) 


Here “lower” refers to terms containing only first order derivatives such as 


a) a (a) a, 
“\ a) Ox’ Oy J Oy’ a 


A necessary condition, however, for the coefficients a, 3, y, 6 to be real is that 


ac — b? 


a5 — (ad + By)? 
= —F(ad — By)? <0. (6.25) 


A factorization of the leading terms in the second-order operator D as the 
product of two real first-order differential operators therefore requires that 
D be hyperbolic or parabolic. It is easy to see that this is also a sufficient 
condition for such a real factorization. For the rest of this section we assume 
that the equation is hyperbolic, and so 


1 
ac— b= —q lad — By)? <0. (6.26) 


With this condition, the two families of flow curves defined by 


dx dy 
: a — 9D, 
C1 dt a(x, y), dt (x,y), (6 7) 
and q q 
CO: y(2,y), n (x,y), (6.28) 


are distinct, and are the characteristics of D. 
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A hyperbolic second-order differential equation Du = 0 can therefore be 
written in either of two ways: 


O O 
(a2 +05) O,4+ Fi = 0, (6.29) 
or ‘ . 
Ge ie 5) Ur+ Fr =0, (6.30) 
where 
Ou Ou 
U; a Ia, +95 ; 
Ou Ou 
= a— — 31 
ln = ose, (6.31) 


and F\» contain only Ou/Ox and Ou/Oy. Given suitable Cauchy data, we 
can solve the two first-order partial differential equations by the method 
of characteristics described in the previous subsection, and so find U;(z, y) 
and U2(x,y). Because the hyperbolicity condition (6.26) guarantees that the 
determinant 


is not zero, we can solve (6.31) and so extract from U,» the individual deriva- 
tives Ou/Ox and Ou/Oy. From these derivatives and the initial values of u, 
we can determine u(z, y). 


6.3 Wave equation 


The wave equation provides the paradigm for hyperbolic equations that can 
be solved by the method of characteristics. 


6.3.1 d’Alembert’s solution 
Let y(x,t) obey the wave equation 


Pp 10 
gl ae 0, —0O <2 < OO. (6.32) 
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We use the method of characteristics to propagate Cauchy data y(x,0) = 
yo(x) and (2,0) = uo(x), given on the curve [T = {x € R,t = 0}, forward 
in time. 

We begin by factoring the wave equation as 


2 2 
= (-288)-(e18) BB) 
Thus, 
(Etig)}U-y=0 (6.34) 
where 
v=ag=<, vaigai®. (6.35) 


The quantity U — V is therefore constant along the characteristic curves 
x — ct = const. (6.36) 


Writing the linear factors in the reverse order yields the equation 
0: «dd 
(7-25) U+y)=0 (6.37) 
c 


This implies that U + V is constant along the characteristics 
x + ct = const. (6.38) 


Putting these two facts together tells us that 


NlLRNI rR 


V(z,t’) [V (a, t') + U(a2,t’)] + SVC, t') —U(2,t’)| 


Vn + of,0) + Ue + ot’, 0)] + 5[V (a — et’, 0) — Ue — et,0)] 
(6.39) 


The value of the variable V at the point (z,t’) has therefore been computed 
in terms of the values of U and V on the initial curve I. After changing 
variables from t’ to € = x + ct’ as appropriate, we can integrate up to find 
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p(2,t) = (e.0)-+e f V (2, t') de’ 


1 atct 1 a—ct 1 atct 
= oe.) +5f vendre | vers f  eeoas 
= 2G+8.aoeod m+ef 5(€,0) dé (6.40) 
5) Y 5 Y ; 20 Je PIS; . : 
This result 
1 atet 
let) = 5 {yale + et) + vole et} + =f wlg)ag (6.41 


is usually known as d’Alembert’s solution of the wave equation. It was actu- 
ally obtained first by Euler in 1748. 


: 


(x,t) 


x+ct=const. 


Hie 


X—ct=const. 


X+ct 
= X% 


Figure 6.3: Range of Cauchy data influencing (2, t). 


The value of y at x,t, is determined by only a finite interval of the initial 
Cauchy data. In more generality, y(xz,t) depends only on what happens in 
the past light-cone of the point, which is bounded by pair of characteristic 
curves. This is illustrated in figure 6.3 

D’Alembert and Euler squabbled over whether yo and vp had to be twice 
differentiable for the solution (6.41) to make sense. Euler wished to apply 
(6.41) to a plucked string, which has a discontinuous slope at the plucked 
point, but d’Alembert argued that the wave equation, with its second deriva- 
tive, could not be applied in this case. This was a dispute that could not be 


6.3. WAVE EQUATION 203 


resolved (in Euler’s favour) until the advent of the theory of distributions. It 
highlights an important difference between ordinary and partial differential 
equations: an ODE with smooth coefficients has smooth solutions; a PDE 
with with smooth coefficients can admit discontinuous or even distributional 
solutions. 

An alternative route to d’Alembert’s solution uses a method that applies 
most effectively to PDE’s with constant coefficients. We first seek a general 
solution to the PDE involving two arbitrary functions. Begin with a change 
of variables. Let 


€ = «+c, 
n = xz-ct. (6.42) 
be light-cone co-ordinates. In terms of them, we have 
= <(€+n) 
c= 9 1) ) 
1 
t= s(€-n), (6.43) 
Now, 
OC. < O80: ) ORO Taf" -100 
— =e +e =H -(—t+-— }. 6.44 
aE Dede | DEAL s(3 +25) ved 
Similarly 
O L/fo loa 
—=-(—--~— }. A 
On 2 & 7) ee) 
Thus 
2 2 2 
ae ee On ee) er a 
Of? OP Of COE Ox cot O€0n 
The characteristics of the equation 
Op 
— 6.47 
aE 0 (6.47) 


are € = const. or 7» = const. There are two characteristics curves through 
each point, so the equation is still hyperbolic. 
With light-cone coordinates it is easy to see that a solution to 


os . hg? Oy 
——— — ———— ———— — — 6.48 
(= aaa) “ata, ae 
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is 
p(x,t) = f(§) + 9(n) = f(a + ct) + g(a — ct). (6.49) 
It is this this expression that was obtained by d’Alembert (1746). 
Following Euler, we use d’Alembert’s general solution to propagate the 
Cauchy data y(x,0) = yo(x) and ¥(x,0) = vo(x) by using this information 
to determine the functions f and g. We have 


f(a) + 9(@) = golx), 
c(f'(x) -— g"(z)) = vo(@). (6.50) 


Integration of the second line with respect to x gives 


F(a) — 92) == f wl6) de + A, (6.51) 


where A is an unknown (but irrelevant) constant. We can now solve for f 
and g, and find 


f(z) = 5? o(2) +s vo(&) d& + 5A, 
2 1 
gz) = vole) - 5 f vole) ag— 5A, (6.52) 
and so 
1 at+ct 
Ole, t) = 5 {pole + et) + ole — et)} + 5 i, = vl6)dg. (6.53) 


The unknown constant A has disappeared in the end result, and again we 
find “d’Alembert’s” solution. 


Exercise 6.2: Show that when the operator D in a constant-coefficient second- 
order PDE Dy = 0 is reducible, meaning that it can be factored into two 
distinct first-order factors D = P,P2, where 


6) 6) 
P, = ore + Bias + Vis 
then the general solution to Dy = 0 can be written as y = $1 + ¢2, where 
P,¢, = 0, Po¢2 = 0. Hence, or otherwise, show that the general solution to 


the equation 
Pp 8 A dp | 


OxOy  ~ Oy2 = Ox Oy a 
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g(x,y) = f(2x — y) + e% g(x), 


where f, g, are arbitrary functions. 
Exercise 6.3: Show that when the constant-coefficient operator D is of the 
0 a 2 
D=P?=(a—+6—+7) , 
( Ox Pay + 


with a # 0, then the general solution to Dy = 0 is given by y = ¢1 + x¢o, 
where Pd). = 0. (If a=0 and 6 £0, then y = 41 + y¢z.) 


form 


6.3.2 Fourier’s solution 


In 1755 Daniel Bernoulli proposed solving for the motion of a finite length L 
of transversely vibrating string by setting 


y(x,t) = i, sin (=) cos (=) . (6.54) 


but he did not know how to find the coefficients A, (and perhaps did not 
care that his cosine time dependence restricted his solution to the intial 
condition (2,0) = 0). Bernoulli’s idea was dismissed out of hand by Euler 
and d’Alembert as being too restrictive. They simply refused to believe that 
(almost) any chosen function could be represented by a trigonometric series 
expansion. It was only fifty years later, in a series of papers starting in 
1807, that Joseph Fourier showed how to compute the A, and insisted that 
indeed “any” function could be expanded in this way. Mathematicians have 
expended much effort in investigating the extent to which Fourier’s claim is 
true. 

We now try our hand at Bernoulli’s game. Because we are solving the 
wave equation on the infinite line, we seek a solution as a Fourier integral. 
A sufficiently general form is 


oe 2 dk a(k)e*e—iwet + at(keRetivnty (6.55) 


wo 2H 
where w; = clk| is the positive root of w? = c*k?. The terms being summed 
by the integral are each individually of the form f(a-—ct) or f(a+ ct), and so 
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(x, t) is indeed a solution of the wave equation. The positive-root convention 
means that positive k corresponds to right-going waves, and negative k to 
left-going waves. 

We find the amplitudes a(k) by fitting to the Fourier transforms 


O(k) | y(x,t = Oe ***dz, 


(oe) 


yk) = / O(a, t = 0)e~**da, (6.56) 


Co 


of the Cauchy data. Comparing 


Cer=0) =e dB ap )eihe, 
eo 

soon = / aN eee (6.57) 
ee aat 


x(k) = iw (a*(—k) — a(k)). (6.58) 


Solving, we find 


a’(k) = 5 (#4) - Sx-), (6.59) 


The accumulated wisdom of two hundred years of research on Fourier 
series and Fourier integrals shows that, when appropriately interpreted, this 
solution is equivalent to d’Alembert’s. 


6.3.3. Causal Green function 
We now add a source term: 


1&p Cy 


PB ~ ag? — Me) — 
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We solve this equation by finding a Green function such that 


€ OO 


2 0t Ax? 


) G(a,t;€,7) = 6(a — €)d(t — 7). (6.61) 


If the only waves in the system are those produced by the source, we should 
demand that the Green function be causal, in that G(x, t;€,7) = 0 if t < Tr. 


t 


9) 


Xx 


Figure 6.4: Support of G(x, t;€,7) for fixed €,7, or the “domain of influence.” 


To construct the causal Green function, we integrate the equation over 
an infinitesimal time interval from 7 — € to 7 + € and so find Cauchy data 


G(a,7 + €;&,7) 


0, 


eee c’6(x — €). (6.62) 


dt 


We insert this data into d’Alembert’s solution to get 


z+c(t—T) 
Ceti&r) = ony fae -Oac 


= s(t — 7) {0(z-€+e(t-7)] -6(7-€-et—7))} 
(6.63) 


We can now use the Green function to write the solution to the inhomo- 
geneous problem as 


gat) = // G(#,t6, 7) 06,7) drdé. (6.64) 
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The step-function form of G(x, t;€,7) allows us to obtain 


plx,t) = I< (x, t3€,n)al€, ete 


I| 
mwla 
leet 
8 
is 
4 
8 
+ 
= 
= 
2 
Q 
— 
Py 
4 
nn" 
a 
Py 


-§ / | Hea arae, (6.65) 


where the domain of integration Q is shown in figure 6.5. 


(x,t) 


x-C(t-T) * x+c(t-T) 


Figure 6.5: The region Q, or the “domain of dependence.” 


We can write the causal Green function in the form of Fourier’s solution 
of the wave equation. We claim that 


—£) ,—iw(t—T) 
Ganen=e fs =| 5 {oo ea \. (6.66) 


where the ze plays the same role in enforcing causality as it does for the 
harmonic oscillator in one dimension. This is only to be expected. If we 
decompose a vibrating string into normal modes, then each mode is an in- 
dependent oscillator with w? = c?k?, and the Green function for the PDE is 
simply the sum of the ODE Green functions for each k mode. To confirm our 
claim, we exploit our previous results for the single-oscillator Green function 
to evaluate the integral over w, and we find 

ee ak 


G(x, t;0,0) = 0(t)<2 / OF an Ree). (6.67) 
2320 Clk 
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Despite the factor of 1/|k|, there is no singularity at k = 0, so no ie is 
needed to make the integral over k well defined. We can do the k integral 
by recognizing that the integrand is nothing but the Fourier representation, 
2 sin ak, of a square-wave pulse. We end up with 


G(x, t;0,0) = a(t) 5 {0(« + ct) — 6(x — ct)}, (6.68) 


the same expression as from our direct construction. We can also write 


eae ys so Sarco 
G(x, 0,0) = 5 /. = (=) {eihe—tclalt _ ethaticlklt) = 0, (6.69) 
which is in explicit Fourier-solution form with a(k) = ic/2|k]. 
Illustration: Radiation Damping. Figure 6.6 shows bead of mass M that 
slides without friction on the y axis. The bead is attached to an infinite 
string which is initially undisturbed and lying along the x axis. The string has 
tension 7’, and a density p, so the speed of waves on the string is c = ,/T’/p. 
We show that either d’Alembert or Fourier can be used to compute the effect 
of the string on the motion of the bead. 

We first use d’Alembert’s general solution to show that wave energy emit- 
ted by the moving bead gives rise to an effective viscous damping force on 
it. 


Figure 6.6: A bead connected to a string. 


The string tension acting on the on the bead leads to the equation of 
motion Mt = Ty'(0,t), and from the condition of no incoming waves we 
know that 

y(x,t) = y(a — ct). (6.70) 
Thus y’(0,t) = —y(0,t)/c. But the bead is attached to the string, so v(t) = 
y(0,t), and therefore 


Mo=-(4)o. (6.71) 
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The emitted radiation therefore generates a velocity-dependent drag force 
with friction coefficient 7 = T’/c. 

We need an infinitely long string for (6.71) to be true for all time. If 
the string had a finite length L, then, after a period of 2L/c, energy will be 
reflected back to the bead and this will complicate matters. 


4 


x 


Figure 6.7: The function ¢9(x) and its derivative. 


We now show that Fourier’s mode-decomposition of the string motion, 
combined with the Caldeira-Leggett analysis of chapter 5, yields the same 
expression for the radiation damping as the d’Alembert solution. Our bead- 
string contraption has Lagrangian 


ex M (0,0)? — V[y(0, t)] + i 


Pp .9 T 22 
Fy — ys dz. 6.72 
; ; { y su de (6.72) 


2 


Here, V[y] is some potential energy for the bead. 

To deal with the motion of the bead, we introduce a function ¢9(x) such 
that @9(0) = 1 and ¢9(x) decreases rapidly to zero as x increases (see figure 
6.7. We therefore have —¢)(x) ¥ 6(a). We expand y(x,t) in terms of ¢o(x) 
and the normal modes of a string with fixed ends as 


y(ax, t) = y(0, t)do(x) + Yoaut asm hig (6.73) 


Here k,L = na. Because y(0,t)¢o(x) describes the motion of only an in- 
finitesimal length of string, y(0,t) makes a negligeable contribution to the 
string kinetic energy, but it provides a linear coupling of the bead to the 
string normal modes, qn(t), through the Ty’*/2 term. Inserting the mode 
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expansion into the Lagrangian, and after about half a page of arithmetic, we 
end up with 


L= = 6(OP-Vly So frawt > G Gwe a)-3> (4) y(0)’, 


where w,, = ck,, and 
2 
Mas ee meet . 
i i (6.75) 


This is exactly the Caldeira-Leggett Lagrangian — including their frequency- 
shift counter-term that reflects that fact that a static displacement of an 
infinite string results in no additional force on the bead.! When L becomes 
large, the eigenvalue density of states 


= 5° 5(w — wn) (6.76) 


becomes " 
ae 6.77 
p(w) ae ( ) 


The Caldeira-Leggett spectral function 


I(w) = >> (=) 5(w — Wp), (6.78) 
is therefore 


Juy= 5S =(F)e (6.79) 


where we have used c = ,/T'/p. Comparing with Caldeira-Leggett’s J(w) = 
nw, we see that the effective viscosity is given by 7 = T/c, as before. The 
necessity of having an infinitely long string here translates into the require- 
ment that we must have a continuum of oscillator modes. It is only after the 
sum over discrete modes w; is replaced by an integral over the continuum of 
w’s that no energy is ever returned to the system being damped. 


‘For a finite length of string that is fixed at the far end, the string tension does add 
4Ty(0)?/L to the static potential. In the mode expansion, this additional restoring force 
arises from the first term of —(x) © 1/L+(2/L) >>, cos knx in $Ty(0)? ($6)? dz. The 
subsequent terms provide the Caldeira-Leggett counter-term. The first-term contribution 
has been omitted in (6.74) as being unimportant for large L. 
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For our bead and string, the mode-expansion approach is more com- 
plicated than d’Alembert’s. In the important problem of the drag forces 
induced by the emission of radiation from an accelerated charged particle, 
however, the mode-expansion method leads to an informative resolution? of 
the pathologies of the Abraham-Lorentz equation, 


DO» e 1 


UG rh) 2 Ee: —2e€ 1 
ry) ie oh 3 Mc? Are 


(6.80) 


which is plagued by runaway, or apparently acausal, solutions. 


6.3.4. Odd vs. even dimensions 


Consider the wave equation for sound in the three dimensions. We have a 
velocity potential @ which obeys the wave equation 


rd Fb db 1646 = 

Ox2 Oy? Oz? ORR” 
and from which the velocity, density, and pressure fluctuations can be ex- 
tracted as 


(6.81) 


Uy = Ve, 
Po ; 
Pl = 2 
Py = “Cp: (6.82) 


In three dimensions, and considering only spherically symmetric waves, 
the wave equation becomes 


fee = 0, (6.83) 


with solution : 
o(r,t) = “f (¢ - ~) + =9 (i + “) (6.84) 


Consider what happens if we put a point volume source at the origin (the 
sudden conversion of a negligeable volume of solid explosive to a large volume 


2G. W. Ford, R. F. O’Connell, Phys. Lett. A 157 (1991) 217. 
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of hot gas, for example). Let the rate at which volume is being intruded be 
q. The gas velocity very close to the origin will be 
i(t 
a g(t) (6.85) 


Anr2 


Matching this to an outgoing wave gives 


HO = oft) = & = Sy (e-2) Ap (t-2), (6.86) 


Arr? Or r2 Cc 


Close to the origin, in the near field, the term « f/r? will dominate, and so 


— Lilt) = F(t) (6.87) 


Further away, in the far field or radiation field, only the second term will 
survive, and so 


j= wf (+-"), (6.88) 


or 
The far-field velocity-pulse profile v, is therefore the derivative of the near- 
field v, pulse profile. 

Av AvorP 
Near field Far field 


Figure 6.8: Three-dimensional blast wave. 


= 


The pressure pulse 
; . r 
Pi =—pop = a (t--) (6.89) 


is also of this form. Thus, a sudden localized expansion of gas produces an 
outgoing pressure pulse which is first positive and then negative. 
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This phenomenon can be seen in (old, we hope) news footage of bomb 
blasts in tropical regions. A spherical vapour condensation wave can been 
seen spreading out from the explosion. The condensation cloud is caused by 
the air cooling below the dew-point in the low-pressure region which tails the 
over-pressure blast. 

Now consider what happens if we have a sheet of explosive, the simultane- 
ous detonation of every part of which gives us a one-dimensional plane-wave 
pulse. We can obtain the plane wave by adding up the individual spherical 
waves from each point on the sheet. 


Figure 6.9: Sheet-source geometry. 


Using the notation defined in figure 6.9, we have 


00 i (oo. oe 
o(a,t) = an | a i eae (6.90) 
0 Va? + 8? Cc 
with f(t) = —q(t)/4a, where now q is the rate at which volume is being 
intruded per unit area of the sheet. We can write this as 


oo 2 2 
am fy G 7 Jae 
0 


t—a/c 


= Inc / f(r) dr, 


—co 
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C t—a/c 
= 5 q(T) dr. (6.91) 


In the second line we have defined 7 = t — Vx? + s?/c, which, inter alia, 
interchanged the role of the upper and lower limits on the integral. 

Thus, v, = ¢’(a,t) = sq(t —x/c). Since the near field motion produced 
by the intruding gas is v)(r) = $q(t), the far-field displacement exactly re- 
produces the initial motion, suitably delayed of course. (The factor 1/2 is 
because half the intruded volume goes towards producing a pulse in the neg- 
ative direction.) 

In three dimensions, the far-field motion is the first derivative of the near- 
field motion. In one dimension, the far-field motion is exactly the same as 
the near-field motion. In two dimensions the far-field motion should there- 
fore be the half-derivative of the near-field motion — but how do you half- 
differentiate a function? An answer is suggested by the theory of Laplace 
transformations as 


(=) * py ae = [. ar. (6.92) 


Let us now repeat the explosive sheet calculation for an exploding wire. 


Figure 6.10: Line-source geometry. 


Using the geometry shown in figure 6.10, we have 


ds = (vr? =a?) = aoe (6.93) 


’ 
p2 — 72 
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and combining the contributions of the two parts of the wire that are the 
same distance from p, we can write 


o(a, t) 


| 
s—— 
8 
31R 
Sh 
i 
om 
| 
O13 
NS 
ral) 
NS 
le 
8 
i) 


aff (i = =) a (6.94) 


with f(t) = —q(t)/47, where now q is the volume intruded per unit length. 


We may approximate r?— x? & 2x(r—z) for the near parts of the wire where 
y app 


r & x, since these make the dominant contribution to the integral. We also 
set 7 =t—r/c, and then have 


be pres dr 
o(z,t) = —= ——— 
(ct — x) —cT 


ie 


The far-field velocity is the x gradient of this, 


9 (t— "a d 
i a j 2 Tera (6.96) 
~ One Vt—2/o) —t —a2/c)—T 


and is therefore proportional to the 1/2-derivative of q(t — r/c). 


av av 
Near field Far field 


Figure 6.11: In two dimensions the far-field pulse has a long tail. 


A plot of near field and far field motions in figure 6.11 shows how the 
far-field pulse never completely dies away to zero. This long tail means that 
one cannot use digital signalling in two dimensions. 
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Moral Tale: One of our colleagues was performing numerical work on earth- 
quake propagation. The source of his waves was a long deep linear fault, 
so he used the two-dimensional wave equation. Not wanting to be troubled 
by the actual creation of the wave pulse, he took as initial data an outgoing 
finite-width pulse. After a short propagation time his numerical solution ap- 
peared to misbehave. New pulses were being emitted from the fault long after 
the initial one. He wasted several months in vain attempt to improve the 
stability of his code before he realized that what he was seeing was real. The 
lack of a long tail on his pulse meant that it could not have been created by 
a briefly-active line source. The new “unphysical” waves were a consequence 
of the source striving to suppress the long tail of the initial pulse. Moral: 
Always check that a solution of the form you seek actually exists before you 
waste your time trying to compute it. 


Exercise 6.4: Use the calculus of improper integrals to show that, provided 
F'(—co) = 0, we have 


dfipt km \ 1 ft FO 
ii € bes aot) Slee OM 


This means that 


d (d\? d\? d 
5(%) PO=(4) GPO (6.98) 
6.4 Heat equation 
Fourier’s heat equation : 
06 oo 


is the archetypal parabolic equation. It often comes with initial data ¢(z,t = 0), 
but this is not Cauchy data, as the curve t = const. is a characteristic. 
The heat equation is also known as the diffusion equation. 


6.4.1 Heat kernel 


If we Fourier transform the initial data 


== / aE ieee (6.100) 


ea OTT 
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and write 


ake 
(a, t) =a a blk te, (6.101) 


(oe) 


we can plug this into the heat equation and find that 


d6 


i: —Kk?o. (6.102) 


Hence, 


ies) = | ate. ne 


3 


hes shat ae 
= i = Lee net (6.103) 


(oe) 


We may now express $(k,0) in terms of $(,0) and rearrange the order of 


integration to get 
© dk . 2 
pbb 0 thé tka—Kk-t 
[FCfeenettas) 


- (f- seike-s)-) o(€, 0)dé 


a / ” Gla, €,06(, 0) dé, (6.104) 


(oe) 


o(a, t) 


where 


eh ae 2 1 1 
G = ik(a—€)—Kk*t = pene _ ¢)2 . 1 
mena [Fe exp f-rate 9b. (6.105) 


Here, G(x, €,t) is the heat kernel. It represents the spreading of a unit blob 
of heat. 
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4 G(x, €,t) 


I) 


EE ->X 
E 


Figure 6.12: The heat kernel at three successive times. 


As the heat spreads, the total amount of heat, represented by the area 
under the curve in figure 6.12, remains constant: 


= a 1 : 
exp ( ———(x£ — de = 1, 6.106 
[as {-pe-o'} (6.106) 
The heat kernel possesses a semigroup property 
G(n,€44 + te) = / G(x, 7, t2)G(n, €, t1)dn. (6.107) 


Exercise: Prove this. 


6.4.2 Causal Green function 


Now we consider the inhomogeneous heat equation 


Ou Ou 


with initial data u(xz,0) = uo(x). We define a Causal Green function by 


(3 - =) G(x, t;£,7) = 6( — O4(t—7) (6.109) 
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and the requirement that G(z,t;€,7) =0 if t < 7. Integrating the equation 
from t=7T—etot=7-+€ tells us that 


G(a,7T +6€;€,7) = 0(a — €). (6.110) 


Taking this delta function as initial data ¢(x,t = 7) and inserting into (6.104) 
we read off 


1 1 : 
Glasts6,7) = Ot) erm ox | Gl 9) \. (6.111) 


We apply this Green function to the solution of a problem involving both 
a heat source and initial data given at t = 0 on the entire real line. We 
exploit a variant of the Lagrange-identity method we used for solving one- 
dimensional ODE’s with inhomogeneous boundary conditions. Let 


oO 
Dae = Ge a (6.112) 
and observe that its formal adjoint, 
O O° 
= Baraat (6.113) 


is a “backward” heat-equation operator. The corresponding “backward” 
Green function 


G" (2, t; €,7) = 07 =) een |e 0} (6.114) 


obeys 
Dy .G" (x, t;€,7) = 6(a — €)5(t — 7), (6.115) 


with adjoint boundary conditions. These make G" anti-causal, in that G'(t — T) 
vanishes when t > 7. Now we make use of the two-dimensional Lagrange 
identity 


fl ae | at {u(0,t)D1,G'(e,t;€,7) — (Dexule,t))Gt(0,t,&7)} 
—oo 0 


= i. dz {u(x, 0)GT(x,0;€,7)} — [ dx {u(x,T)G'(x,T;€,7)}. (6.116) 
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Assume that (€,7) lies within the region of integration. Then the left hand 
side is equal to 


— i ae f dt {q(z,t)G"(z,t:€,r)}. (6.117) 


On the right hand side, the second integral vanishes because G' is zero on 
fa 7". “Thus, 


n= [ae [ aef dt{q (x, t)GT (2, t; €,7) b+ fi {ule (One « )} da 


(6.118) 
Rewriting this by using 


O@,6 7) = Ca re,9), (6.119) 


and relabeling 7 «+ € and t ~— 7, we have 


uot) = in G(x, t; €, 0)uo(€) arf f G(a,t;€,7)q(€,7)d&dr. (6.120) 


Co 


Note how the effects of any heat source q(x, t) active prior to the initial-data 
epoch at t = 0 have been subsumed into the evolution of the initial data. 


6.4.3. Duhamel’s principle 


Often, the temperature of the spatial boundary of a region is specified in 
addition to the initial data. Dealing with this type of problem leads us to a 
new strategy. 

Suppose we are required to solve 


OY sagt (6.121) 


for the semi-infinite rod shown in figure 6.13. We are given a specified tem- 
perature, u(0,t) = A(t), at the end x = 0, and for all other points x > 0 we 
are given an initial condition u(x,0) = 0. 
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u(x,t) 
h(t) | 
3 


Figure 6.13: Semi-infinite rod heated at one end. 


We begin by finding a solution w(z, t) that satisfies the heat equation with 
w(0,t) = 1 and initial data w(x,0) = 0, x > 0. This solution is constructed 
in problem 6.14, and is 


w = 0(t) {1 —erf (=) } (6.122) 


Here erf(x) is the error function 


erl(s) = = [ e-* dz. (6.123) 


which has the properties that erf(0) = 0 and erf(x) — 1 as x — co. See 
figure 6.14. 


erf(x) 


ae, 3 


Figure 6.14: Error function. 


If we were given 


h(t) = hoO(t — to), (6.124) 


then the desired solution would be 


u(x,t) = how(z,t — to). (6.125) 
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For a sum 


h(t) = S— h,O(t — th), (6.126) 


the principle of superposition (7.e. the linearity of the problem) tell us that 
the solution is the corresponding sum 


Ul et = S- hyw(az,t — ty). (6.127) 


We therefore decompose h(t) into a sum of step functions 


h(t) = h(0)+ : (7) dr 


= pA(0)+ / “GE P@ dr. (6.128) 


It is should now be clear that 


ay = i, HELA ewe! 


ba (ates ie 


/ : (Fete = *)) h(r) dr. (6.129) 


This is called Duhamel’s solution, and the trick of expressing the data as a 
sum of Heaviside step functions is called Duhamel’s principle. 

We do not need to be as clever as Duhamel. We could have obtained 
this result by using the method of images to find a suitable causal Green 
function for the half line, and then using the same Lagrange-identity method 
as before. 


6.5 Potential theory 


The study of boundary-value problems involving the Laplacian is usually 
known as “‘Potential Theory.” We seek solutions to these problems in some 
region 2, whose boundary we denote by the symbol OQ. 
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Poisson’s equation, —V?y(r) = f(r), r € ©, and the Laplace equation to 
which it reduces when f(r) = 0, come along with various boundary condi- 
tions, of which the commonest are 


x=g(r) on OQ, (Dirichlet) 
(n-V)x= g(r) on OQ. (Neumann) (6.130) 


A function for which V2 = 0 in some region 2 is said to be harmonic there. 


6.5.1 Uniqueness and existence of solutions 


We begin by observing that we need to be a little more precise about what 
it means for a solution to “take” a given value on a boundary. If we ask for 
a solution to the problem V*y = 0 within 0 = {(z,y) € R?: 2? +y? < 1} 
and y = 1 on 02, someone might claim that the function defined by setting 
p(x, y) = 0 for 27 +y? < 1 and y(z,y) = 1 for x? + y? = 1 does the job— 
but such a discontinuous “solution” is hardly what we had in mind when we 
stated the problem. We must interpret the phrase “takes a given value on the 
boundary” as meaning that the boundary data is the limit, as we approach 
the boundary, of the solution within Q. 

With this understanding, we assert that a function harmonic in a bounded 
subset 22 of R” is uniquely determined by the values it takes on the boundary 
of . To see that this is so, suppose that y~, and ~» both satisfy V7y = 0 in 
Q, and coincide on the boundary. Then y = 1 — ¢2 obeys V7x = 0 in Q, 
and is zero on the boundary. Integrating by parts we find that 


i; IVx|?d"r = I. x(n- V)x dS = 0. (6.131) 


Here dS is the element of area on the boundary and n the outward-directed 
normal. Now, because the second derivatives exist, the partial derivatives 
entering into Vy must be continuous, and so the vanishing of integral of 
|Vx|? tells us that Vy is zero everywhere within Q. This means that y is 
constant — and because it is zero on the boundary it is zero everywhere. 
An almost identical argument shows that if Q is a bounded connected 
region, and if y, and ~» both satisfy Vy = 0 within Q and take the same 
values of (n- V)y on the boundary, then y; = y2+ const. We have therefore 
shown that, if it exists, the solutions of the Dirichlet boundary value problem 
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is unique, and the solution of the Neumann problem is unique up to the 
addition of an arbitrary constant. 

In the Neumann case, with boundary condition (n- V)y = g(r), and 
integration by parts gives 


[voar= | n-V)pds = f gds, (6.132) 
Q an 


0Q 


and so the boundary data g(r) must satisfy les gdS = 0 if a solution to 
V7 = 0 is to exist. This is an example of the Fredhom alternative that 
relates the existence of a non-trivial null space to constraints on the source 
terms. For the inhomogeneous equation —V?y = f, the Fredholm constraint 


becomes 
| gdS + ‘E Far =v: (6.133) 
aa Q 


Given that we have satisfied any Fredholm constraint, do solutions to the 
Dirichlet and Neumann problem always exist? That solutions should exist is 
suggested by physics: the Dirichlet problem corresponds to an electrostatic 
problem with specified boundary potentials and the Neumann problem cor- 
responds to finding the electric potential within a resistive material with 
prescribed current sources on the boundary. The Fredholm constraint says 
that if we drive current into the material, we must must let it out somewhere. 
Surely solutions always exist to these physics problems? In the Dirichlet case 
we can even make a mathematically plausible argument for existence: We 
observe that the boundary-value problem 


Vo = 0, renQ 
yp = f, reaad (6.134) 


is solved by taking y to be the y that minimizes the functional 
Ax ( Vx \2d"r (6.135) 
Q 


over the set of continuously differentiable functions taking the given boundary 
values. Since J|x] is positive, and hence bounded below, it seems intuitively 
obvious that there must be some function x for which J[x] is a minimum. 
The appeal of this Dirichlet principle argument led even Riemann astray. 
The fallacy was exposed by Weierstrass who provided counterexamples. 
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Consider, for example, the problem of finding a function v(x, y) obeying 
V’y = 0 within the punctured disc D’ = {(z,y) € R?: 0 < 274+ y? < 1} 
with boundary data v(x, y) = 1 on the outer boundary at x? + y? = 1 and 
(0,0) = 0 on the inner boundary at the origin. We substitute the trial 
functions 


Xa(x,y) = (2? +y7)*, a>, (6.136) 


all of which satisfy the boundary data, into the positive functional 
a= i Vx? drdy (6.137) 
D’ 


to find J[xq] = 27a. This number can be made as small as we like, and so 
the infimum of the functional J[x] is zero. But if there is a minimizing y, 
then J|y] = 0 implies that y is a constant, and a constant cannot satisfy the 
boundary conditions. 

An analogous problem reveals itself in three dimensions when the bound- 
ary of 2 has a sharp re-entrant spike that is held at a different potential from 
the rest of the boundary. In this case we can again find a sequence of trial 
functions x(r) for which J[y] becomes arbitrarily small, but the sequence of 
y’s has no limit satisfying the boundary conditions. The physics argument 
also fails: if we tried to create a physical realization of this situation, the 
electric field would become infinite near the spike, and the charge would leak 
off and and thwart our attempts to establish the potential difference. For 
reasonably smooth boundaries, however, a minimizing function does exist. 

The Dirichlet-Poisson problem 


-V’g(r) = f(t), reg, 
g(r) = g(r), re on, (6.138) 


and the Neumann-Poisson problem 


-V'g(r) = f(r), ren, 
(n-V)p(r) = g(r), «€ 00 


supplemented with the Fredholm constraint 


frare | gdS =0 (6.139) 
Q OQ 
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also have solutions when OQ is reasonably smooth. For the Neumann-Poisson 
problem, with the Fredholm constraint as stated, the region 2. must be con- 
nected, but its boundary need not be. For example, 2 can be the region 
between two nested spherical shells. 


Exercise 6.5: Why did we insist that the region 2 be connected in our dis- 
cussion of the Neumann problem? (Hint: how must we modify the Fredholm 
constraint when 2 consists of two or more disconnected regions?) 


Exercise 6.6: Neumann variational principles. Let 2 be a bounded and con- 
nected three-dimensional region with a smooth boundary. Given a function f 
defined on 2 and such that te f d’r = 0, define the functional 


six) = f {5IvxP— xs} Br. 


Suppose that y is a solution of the Neumann problem 
-V’y(r) = f(r), reg, 
(n-V)y(r) = 0, reon. 


Show that 
1 1 1 
Sod = Jit f FIV -v Par > tel =- f Sverre =-5 f ofarr 
1e7 Q Q 


Deduce that y is determined, up to the addition of a constant, as the function 
that minimizes J[x] over the space of all continuously differentiable y (and 
not just over functions satisfying the Neumann boundary condition.) 


Similarly, for g a function defined on the boundary 02 and such that /f, aq 9g dS = 
0, set 


1 
Kil= f sivxPar— fi xgas. 
Q 0. 


Now suppose that ¢ is a solution of the Neumann problem 


—~V7¢(r) 0. Fe, 
(n-V)d(r) = g(r), reoan. 


Show that 


1 
Kb = Kiet f SIV oP ar > Kia] =— f Fiver atr=—5 | ogas 
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Deduce that @ is determined up to a constant as the function that minimizes 
K |x] over the space of all continuously differentiable y (and, again, not just 
over functions satisfying the Neumann boundary condition.) 


Show that when f and g fail to satisfy the integral conditions required for 
the existence of the Neumann solution, the corresponding functionals are not 
bounded below, and so no minimizing function can exist. 


Exercise 6.7: Helmholtz decomposition Let 2“ be a bounded connected three- 
dimensional region with smooth boundary 02. 


a) Cite the conditions for the existence of a solution to a suitable Neumann 
problem to show that if u is a smooth vector field defined in 2, then 
there exist a unique solenoidal (i.e having zero divergence) vector field 
v with v-n=0 on the boundary OQ, and a unique (up to the addition 
of a constant) scalar field ¢ such that 


u=v+V¢. 


Here n is the outward normal to the (assumed smooth) bounding surface 
of Q. 

b) In many cases (but not always) we can write a solenoidal vector field v 
as v = curlw. Again by appealing to the conditions for existence and 
uniqueness of a Neumann problem solution, show that if we can write 
v =curlw, then w is not unique, but we can always make it unique by 
demanding that it obey the conditions divw = 0 and w-n = 0. 

c) Appeal to the Helmholtz decomposition of part a) with u— (v- V)v to 
show that in the Euler equation 


O 

at: Vv=-VP, v-n=0o0n 00 
governing the motion of an incompressible (div v = 0) fluid the instan- 
taneous flow field v(x, y, z,t) uniquely determines Ov/0t, and hence the 
time evolution of the flow. (This observation provides the basis of prac- 


tical algorithms for computing incompressible flows.) 


We can always write the solenoidal field as v = curlw +h, where h obeys 
V°h = 0 with suitable boundary conditions. See exercise 6.16. 


6.5.2 Separation of variables 


Cartesian coordinates 


When the region of interest is a square or a rectangle, we can solve Laplace 
boundary problems by separating the Laplace operator in cartesian co-ordinates. 
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Let 
= i = (6.140) 
and write 
e= XE) y), (6.141) 
so that 
lor xX: . Lory 


Xoe tVyop 7° (6.142) 


Since the first term is a function of x only, and the second of y only, both 
must be constants and the sum of these constants must be zero. Therefore 


1 OX ; 
Xo ~~" 
ie ca 
or, equivalently 
ex , 
A +k°X = 0, 
ary 


The number that we have, for later convenience, written as k? is called a 
separation constant. The solutions are X = e*** and Y = e**¥. Thus 


gS ete, (6.145) 


or a sum of such terms where the allowed k’s are determined by the boundary 
conditions. 

How do we know that the separated form X (x)Y (y) captures all possible 
solutions? We can be confident that we have them all if we can use the sep- 
arated solutions to solve boundary-value problems with arbitrary boundary 
data. 
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Ay 
LE 


Figure 6.15: Square region. 


We can use our separated solutions to construct the unique harmonic 
function taking given values on the sides a square of side L shown in figure 
6.15. To see how to do this, consider the four families of functions 


2 1 _ nme .. niry 
ats x a as iP ia 

2 1 ., nme . nmy 
aie = xt nt cas a a 

2 1 _ nag ,, nn(L—y) 
P3n = ye sin 2 sin MEY), 


2 1 _, nm(L—2) . nay 
= ,/———— sinh ———— sin —. : 
cm Lsnhnn L eee | fone) 


Each of these comprises solutions to V*,p = 0. The family y1,,(a, y) has been 
constructed so that every member is zero on three sides of the square, but 
on the side y = L it becomes 1,,(2, L) = \/2/Lsin(nax/L). The 91,(2, L) 
therefore constitute an complete orthonormal set in terms of which we can 
expand the boundary data on the side y = L. Similarly, the other other 
families are non-zero on only one side, and are complete there. Thus, any 
boundary data can be expanded in terms of these four function sets, and the 
solution to the boundary value problem is given by a sum 


(oe) 


oa) = >— > Qmnea nly): (6.147) 


m=1 n=] 
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The solution to V?y = 0 in the unit square with y = 1 on the side y = 1 
and zero on the other sides is, for example, 


= 4 1 : ; 
gi) = S- (n+ Lm sinh(On + 1)n sin((2n + re) sinh ((2n + ry) 
(6.148) 


n=0 


Figure 6.16: Plot of first thiry terms in equation (6.148). 


For cubes, and higher dimensional hypercubes, we can use similar bound- 
ary expansions. For the unit cube in three dimensions we would use 


1 


Cin et) = Oye) sin(n7x) sin(m7y) sinh (nev n? + m?) : 
to expand the data on the face z = 1, together with five other solution 
families, one for each of the other five faces of the cube. 

If some of the boundaries are at infinity, we may need only need some of 
these functions. 
Example: Figure 6.17 shows three conducting sheets, each infinite in the z 
direction. The central one has width a, and is held at voltage Vo. The outer 
two extend to infinity also in the y direction, and are grounded. The resulting 
potential should tend to zero as |z]|, |y| — oo. 
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as 


Figure 6.17: Conducting sheets. 


The voltage in the x = 0 plane is 


© dk | 
“igaE i Falke ™, (6.149) 
pe 
where p 
a2 OV, 
OV / ei¥ dy = —* sin(ka/2), (6.150) 
—a/2 


Then, taking into account the boundary condition at large x, the solution to 
V7 = 0 is 
* oi (ben thu e—lrlle| 
yp(2,y,z) = —a(k)e"e ; (6.151) 
oo 27 
The evaluation of this integral, and finding the charge distribution on the 
sheets, is left as an exercise. 


The Cauchy problem is ill-posed 


Although the Laplace equation has no characteristics, the Cauchy data prob- 
lem is ill-posed, meaning that the solution is not a continuous function of the 
data. To see this, suppose we are given V?y = 0 with Cauchy data on y = 0: 


p(e0). = oO, 
Op : 
By = esinka. (6.152) 
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Then . 
Vity) = E sin(ka) sinh(ky). (6.153) 


Provided k is large enough — even if € is tiny — the exponential growth of the 
hyperbolic sine will make this arbitrarily large. Any infinitesimal uncertainty 
in the high frequency part of the initial data will be vastly amplified, and 
the solution, although formally correct, is useless in practice. 


Polar coordinates 


We can use the separation of variables method in polar coordinates. Here, 


Oy LOM. dat 
2, 9X , 10% , 1O%X 
Vx= 55 tos + oa (6.154) 
Set 
x(r, 0) = R(r)O(6). (6.155) 


Then V’y = 0 implies 


ods r?(8R 10R 100 
~  R\ Or? r Or 0 00? 
= m? - m’, (6.156) 
where in the second line we have written the separation constant as m?. 
Therefore, 
ao i 
We +m‘*9 = 0, (6.157) 
implying that O = e””’, where m must be an integer if © is to be single- 


valued, and 
d?R dR 
2 2p _ 
whose solutions are R = r*™ when m # 0, and 1 or Inr when m = 0. The 
general solution is therefore a sum of these 


x = Ao + Bolnr + > Agr Begley, (6.159) 
m#0 


The singular terms, Inr and r~!!, are not solutions at the origin, and should 
be omitted when that point is part of the region where Vx = 0. 
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Example: Dirichlet problem in the interior of the unit circle. Solve V?x = 0 


inQ={re 


R? : |r| < 1} with y = f(@) on OQ = {|r| = 1}. 


Figure 6.18: Dirichlet problem in the unit circle. 


We expand 


x¥(n2) = se Aare, (6.160) 


m=—CcoO 


and read off the coefficients from the boundary data as 


1 20 ie 
n= et £0") dé’. (6.161) 
27 Jo 
Thus, 
1 Qn oo : ; 
er | Dy aimee’ f (0) do’. (6.162) 
T Jo M=—0o 
We can sum the geometric series 
ore) —i(0—6') 
Sy rere = ee eee ee aie 
a 1 — ret0-") — 1 — rei(0-6) 


Therefore, 


= ie (6.163) 
1 — 2rcos(@ — 6’) +r? 


1 - l=7 ! / 
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This expression is known as the Poisson kernel formula. Observe how the 
integrand sharpens towards a delta function as r approaches unity, and so 
ensures that the limiting value of y(r,@) is consistent with the boundary 
data. 

If we set r = 0 in the Poisson formula, we find 


(0,6) = — / " £(0") a! (6.165) 


We deduce that if V7, = 0 in some domain then the value of y at a point 
in the domain is the average of its values on any circle centred on the chosen 
point and lying wholly in the domain. 

This average-value property means that y can have no local maxima or 

minima within 2. The same result holds in R”, and a formal theorem to this 
effect can be proved: 
Theorem (The mean-value theorem for harmonic functions): If x is harmonic 
(V?y = 0) within the bounded (open, connected) domain Q € R", and is 
continuous on its closure 2, and ifm < x < M on OQ, thenm < x < M 
within 02 — unless, that is, m = M, when y = m is constant. 


Pie-shaped regions 


Figure 6.19: A pie-shaped region of opening angle a. 


Electrostatics problems involving regions with corners can often be under- 
stood by solving Laplace’s equation within a pie-shaped region. 


236 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS 


Figure 6.19 shows a pie-shaped region of opening angle a and radius R. 
If the boundary value of the potential is zero on the wedge and non-zero on 
the boundary arc, we can seek solutions as a sum of r, @ separated terms 


- 0 
CRD) = yar sin (=) (6.166) 


n=1 


Here the trigonometric function is not 27 periodic, but instead has been 
constructed so as to make y vanish at 6 = 0 and 6 = a. These solutions 
show that close to the edge of a conducting wedge of external opening angle 
a, the surface charge density o usually varies as o(r) x r°/7~1, 

If we have non-zero boundary data on the edge of the wedge at 0 = a, 
but have y = 0 on the edge at 6 = 0 and on the curved arc r = R, then the 
solutions can be expressed as a continuous sum of 7, 6 separated terms 


ot) = 3 fo ( CG) (B) ”) Sttony 


7 a(v) sin{y In(r/R)| ane) 


sinh(va) 


dv. (6.167) 


The Mellin sine transformation can be used to computing the coefficient 
function a(v). This transformation lets us write 


fe -[" F(v)sin(vinr)dvy, O<r<1l, (6.168) 
where i ‘ 
Fay / sin(v Inr) f(r) - (6.169) 


The Mellin sine transformation is a disguised version of the Fourier sine 
transform of functions on [0,0o). We simply map the positive x axis onto 
the interval (0, 1] by the change of variables x = —Inr. 

Despite its complexity when expressed in terms of these formulae, the 
simple solution y(r, 0) = aé is often the physically relevant one when the two 
sides of the wedge are held at different potentials and the potential is allowed 
to vary on the curved arc. 

Example: Consider a pie-shaped region of opening angle 7 and radius R = 
oo. This region can be considered to be the upper half-plane. Suppose that 
we are told that the positive x axis is held at potential +1/2 and the negative 
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x axis is at potential —1/2, and are required to find the potential for positive 
y. If we separate Laplace’s equation in cartesian co-ordinates and match to 
the boundary data on the x-axes, we end up with 


ly 
Pry(Z, y) = = | —e—*Y sin(ka) dk. 
T Jo k 
On the other hand, the function 


pro(r8) = —(nr/2— 8) 


satisfies both Laplace’s equation and the boundary data. At this point we 
ought to worry that we do not have enough data to determine the solution 
uniquely — nothing was said in the statement of the problem about the 
behavior of y on the boundary arc at infinity — but a little effort shows that 


dt er 
-{ —e- sin(kx) dk = e tan-} ( = Laps, 
o &k y 


7 


(6.170) 


and so the two expressions for v(x, y) are equal. 


6.5.3 Eigenfunction expansions 


Elliptic operators are the natural analogues of the one-dimensional linear 
differential operators we studied in earlier chapters. 

The operator L = —V? is formally self-adjoint with respect to the inner 
product 


(OX) = / ox dady. (6.171) 


This property follows from Green’s identity 


[fo cvt0 - veya} dedy = (6%) - (-W0)"x} nas 
Q 2 

(6.172) 
where OC is the boundary of the region 2 and n is the outward normal on 
the boundary. 
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The method of separation of variables also allows us to solve eigenvalue 
problems involving the Laplace operator. For example, the Dirichlet eigen- 
value problem requires us to find the eigenfunctions and eigenvalues of the 
operator 


L=-V’, D(L)={¢¢€ LQ]: ¢=0, on 0}. (6.173) 


Suppose (2 is the rectangle 0 <a < Lz, 0 <y < L,. The normalized 


eigenfunctions are 
ge sag (6.174) 
sin ; 
ie i Es Ey 


neq mr 
dm = (SE) + (SE), (6.175) 
P LP 


The eigenfunctions are orthonormal, 


Opn CLC 


with eigenvalues 


[ enmdsn dxdy = Ona wis (6.176) 


and complete. Thus, any function in L?[Q] can be expanded as 


f(z, y) = S- Ai Onin Bs Y); (6.177) 
m,n=1 
where 


We can find a complete set of eigenfunctions in product form whenever we 
can separate the Laplace operator in a system of co-ordinates €; such that the 
boundary becomes €; = const. Completeness in the multidimensional space 
is then guaranteed by the completeness of the eigenfunctions of each one- 
dimensional differential operator. For other than rectangular co-ordinates, 
however, the separated eigenfunctions are not elementary functions. 

The Laplacian has a complete set of Dirichlet eigenfunctions in any region, 
but in general these eigenfunctions cannot be written as separated products 
of one-dimensional functions. 
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6.5.4 Green functions 
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Once we know the eigenfunctions y,, and eigenvalues .,, for —V? in a region 


(2, we can write down the Green function as 


g(a.r') = > vale )yn(e) 


For example, the Green function for the Laplacian in the entire 


by the sum over eigenfunctions 


(ee ke 


Thus 


R” is given 


(6.179) 


(6.180) 


We can evaluate the integral for any n by using Schwinger’s trick to turn the 


integrand into a Gaussian: 


i d°k ‘ / 2 
r, ros | ds | elk (t—r Jeask 
g(x,1") : nr 


oe T : 1 A 12 
_ —qslr-2" | 
f % (V5) mr 


n> dL i dt t3~2e—#r-8/?/4 
0 


Qnzn/2 
7 1 r (2 1) jr — r’|? akc 
— Qngqn/2” \2 4 


Here, I(x) is Euler’s gamma function: 


I(x) = | i Se 
0 


and ao 
2 n 
Be yee 


D'(n/2) 


(6.181) 


(6.182) 


(6.183) 
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is the surface area of the n-dimensional unit ball. 
For three dimensions we find 
1 


— —_ = 3. 6.184 
An |r — r’|’ - ( ) 


g(r, r’) = 


In two dimensions the Fourier integral is divergent for small k. We may 
control this divergence by using dimensional regularization. We pretend that 
n is a continuous variable and use 


eee “T(t +1) (6.185) 


together with 
ase HT alae ies (6.186) 


to to examine the behaviour of g(r,r’) near n = 2: 


oer) = a (1 — (n/2 —1) n(a|r —r'|*?) +O [(n = 2)7]) 
= ze (seca ~ Pine - 1 |- In). (6.187) 


Here y = —I’(1) = .57721... is the Euler-Mascheroni constant. Although 
the pole 1/(n—2) blows up at n = 2, it is independent of position. We simply 
absorb it, and the — Ina — ¥, into an undetermined additive constant. Once 
we have done this, the limit n — 2 can be taken and we find 


1 
g(r, xr’) = “pe In |r — r’| + const., cs (6.188) 
7 


The constant does not affect the Green-function property, so we can chose 
any convenient value for it. 

Although we have managed to sweep the small-k divergence of the Fourier 
integral under a rug, the hidden infinity still has the capacity to cause prob- 
lems. The Green function in R? allows us to to solve for y(r) in the equation 


—-V*y = q(r), 


with the boundary condition y(r) — 0 as |r| — oo, as 
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In two dimensions, however we try to adjust the arbitrary constant in (6.188), 
the divergence of the logarithm at infinity means that there can be no solution 
to the corresponding boundary-value problem unless f g(r) dr = 0. This is 
not a Fredholm-alternative constraint because once the constraint is satisfied 
the solution is unique. The two-dimensional problem is therefore patholog- 
ical from the viewpoint of Fredholm theory. This pathology is of the same 
character as the non-existence of solutions to the three-dimensional Dirichlet 
boundary-value problem with boundary spikes. The Fredholm alternative 
applies, in general, only to operators a discrete spectrum. 


Exercise 6.8: Evaluate our formula for the R” Laplace Green function, 


1 


/ — 
He?) = GS ae PP 


with S,_; = 2n"/?/T(n/2), for the case n = 1. Show that the resulting 
expression for g(x, x’) is not divergent, and obeys 


d2 


— Fag (@ 2’) = 6(z — 2’). 


Our formula therefore makes sense as a Green function — even though the 
original integral (6.179) is linearly divergent at k = 0! We must defer an 
explanation of this miracle until we discuss analytic continuation in the context 
of complex analysis. 

(Hint: recall that [(1/2) = 7) 


6.5.5 Boundary-value problems 


We now look at how the Green function can be used to solve the interior 
Dirichlet boundary-value problem in regions where the method of separation 
of variables is not available. Figure 6.20 shows a bounded region 22 possessing 
a smooth boundary 00. 
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Figure 6.20: Interior Dirichlet problem. 


We wish to solve —V?y = q(r) for r € Q and with y(r) = f(r) for r € OQ. 
Suppose we have found a Green function that obeys 


—V29(r,r') = d"(r—r’), rr’ €Q, g(r,r’)=0, redaQ. (6.189) 


We first show that g(r,r’) = g(r’,r) by the same methods we used for one- 
dimensional self-adjoint operators. Next we follow the strategy that we used 
for one-dimensional inhomogeneous differential equations: we use Lagrange’s 
identity (in this context called Green’s theorem) to write 


[arr {alex V2e(8) - ele) T2000} 


= fae {alesr)Vee(2) — ele) Vagl0.e)}, (6.190) 


where dS, = ndS,, with n the outward normal to OQ at the point r. The 
left hand side is 


LHS. =f dr{-g(e.r)ale) + )s"(e -1)}, 
= - f troer ae) + ot) 
= ~ f arate'a) ale) + 90°) (6.191) 


On the right hand side, the boundary condition on g(r,r’) makes the first 
term zero, SO 


RHS=— | _dSef(a)an Ve)olnr). (6.192) 
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Therefore, 


y(r’) = [oe r) g(r) dr — f(r)(n- Ve)g(r, 2’) dS. (6.193) 


0Q 


In the language of chapter 3, the first term is a particular integral and the 
second (the boundary integral term) is the complementary function. 


Exercise 6.9: Assume that the boundary is a smooth surface, Show that the 
limit of y(r’) as r’ approaches the boundary is indeed consistent with the 
boundary data f(r’). (Hint: When r, r’ are very close to it, the boundary can 
be approximated by a straight line segment, and so g(r,r’) can be found by 
the method of images.) 


Figure 6.21: Exterior Dirichlet problem. 


A similar method works for the exterior Dirichlet problem shown in figure 
6.21. In this case we seek a Green function obeying 


—Vig(r,r’)=o"(r—r'’), rx ER"\Q  g(r,r')=0, rE dQ. (6.194) 


(The notation R” \Q means the region outside 2.) We also impose a further 
boundary condition by requiring g(r,r’), and hence y(r), to tend to zero as 
|r| — oo. The final formula for y(r) is the same except for the region of 
integration and the sign of the boundary term. 

The hard part of both the interior and exterior problems is to find the 
Green function for the given domain. 
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Exercise 6.10: Suppose that v(x, y) is harmonic in the half-plane y > 0, tends 
to zero as y — oo, and takes the values f(z) on the boundary y = 0. Show 
that 
== [- ——_¥ __ f(a) de', y>0 
ACE eat Oe ORE Var: , wee. 


Deduce that the “energy” functional 


def 1 


1 [oe 
2 = 
Slf] = >| Ive dedy=—5 ffl ay 


can be expressed as 


and ff {ESA are 


The non-local functional S[f] appears in the quantum version of the Caldeira- 
Leggett model. See also exercise 2.24. 


Method of Images 


When 02 is a sphere or a circle we can find the Dirichlet Green functions for 
the region 2 by using the method of images. 


Figure 6.22: Points inverse with respect to a circle. 


Figure 6.22 shows a circle of radius R. Given a point B outside the circle, 
and a point X on the circle, we construct A inside and on the line OB, so 
that ZOBX = ZOXA. We now observe that AXOA is similar to ABOX, 
and so 

OA OX 


hcciay a ocad a) 
OX OB ire) 
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Thus, OA x OB = (OX)? = R?. The points A and B are therefore mutually 
inverse with respect to the circle. In particular, the point A does not depend 
on which point X was chosen. 

Now let AX= r;, BX= ro and OB= B. Then, using similar triangles 
again, we have 


AX BX 
——SS i _- o—_—__— al 
OX OB’ Re ta6) 
or B. B 
a (6.197) 
Vr; To 
and so he ; 
—{—)]-—=0. 6.198 
T; (5) To ? ( ) 


Interpreting the figure as a slice through the centre of a sphere of radius R, 
we see that if we put a unit charge at B, then the insertion of an image charge 
of magnitude q = —R/B at A serves to the keep the entire surface of the 
sphere at zero potential. 

Thus, in three dimensions, and with Q the region exterior to the sphere, 
the Dirichlet Green function is 


1 fl A il 
go(t,tp) = — (4 - (=) Hi} ; (6.199) 
dr \ |r — rp| Irs|/ |r —ra| 


In two dimensions, we find similarly that 


1 
go(t, rp) = —=(In jt — rg] —In|r—ra| —In (Irsl/R)), (6.200) 


has ga(r, rg) = 0 for r on the circle. Thus, this is the Dirichlet Green function 
for Q, the region exterior to the circle. 

We can use the same method to construct the interior Green functions 
for the sphere and circle. 


6.5.6 Kirchhoff vs. Huygens 


Even if we do not have a Green function tailored for the specific region in 
which were are interested, we can still use the whole-space Green function 
to convert the differential equation into an integral equation, and so make 
progress. An example of this technique is provided by Kirchhoff’s partial 
justification of Huygens’ construction. 
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The Green function G(r,r’) for the elliptic Helmholtz equation 


(-V?4+6)G(r,r') = &(r —1’) (6.201) 
in R® is given by 
dek eik (r—r’) 1 ; 
aa pee eS 6.202 
/ (2Q7)3 k? + K? mje —r'| ( ) 


Exercise 6.11: Perform the & integration and confirm this. 


For solutions of the wave equation with e~“’ time dependence, we want 
a Green function such that 


|-v - (=) G(r,r’) = &(r — 1’), (6.203) 


2 


and so we have to take K* negative. We therefore have two possible Green 


functions 1 
Ga(r,r’) = ———et#* FF, (6.204) 


Ani — x’| 


where k = |w|/c. These correspond to taking the real part of K? negative, but 

giving it an infinitesimal imaginary part, as we did when discussing resolvent 

operators in chapter 5. If we want outgoing waves, we must take G=G,. 
Now suppose we want to solve 


(V7 +k?) =0 (6.205) 
Scien Wiel tue ene Os NB Detoues welsune Crean & hesreciomale 
fF {G(r,r)(V2 + 2 W(r) — V)(V2 + P(r r')} de 

=f {GOr)Vevlr) —W0)VeGEr2)} Se (6.206) 


where dS, = ndS,, with n the outward normal to OQ at the point r. The 
left hand side is 


[vmsre —r)d"t£= ae ; ; 7 (6.207) 
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and so 


wv(r') = {G(r,r’)(n- Vz)v(r) — v(r)(n- V_)G(r,r')} dS,, 0 € 0. 


an 

(6.208) 
This must not be thought of as solution to the wave equation in terms of an 
integral over the boundary, analogous to the solution (6.193) of the Dirichlet 
problem that we found in the last section. Here, unlike that earlier case, 
G(r, r’) knows nothing of the boundary 0Q, and so both terms in the surface 
integral contribute to ~. We therefore have a formula for y(r) in the interior 
in terms of both Dirichlet and Neumann data on the boundary 0, and 
giving both over-prescribes the problem. If we take arbitrary values for w 
and (n- V)w on the boundary, and plug them into (6.208) so as to compute 
v(r) within © then there is no reason for the resulting w(r) to reproduce, as r 
approaches the boundary, the values w and (n-V)wv appearing in the integral. 
If we demand that the output ~(r) does reproduce the input boundary data, 
then this is equivalent to demanding that the boundary data come from a 
solution of the differential equation in a region encompassing . 


Figure 6.23: Huygens’ construction. 


The mathematical inconsistency of assuming arbitrary boundary data 
notwithstanding, this is exactly what we do when we follow Kirchhoff and 
use (6.208) to provide a justification of Huygens’ construction as used in 
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optics. Consider the problem of a plane wave, w = e’**, incident on a screen 


from the left and passing though the aperture labelled AB in figure 6.23. 
We take as the region 2 everything to the right of the obstacle. The Kirch- 
hoff approximation consists of assuming that the values of w and (n- V)w 
on the surface AB are e’** and —ike‘**, the same as they would be if the 
obstacle were not there, and that they are identically zero on all other parts 
of the boundary. In other words, we completely ignore any scattering by 
the material in which the aperture resides. We can then use our formula to 
estimate w in the region to the right of the aperture. If we further set 


V-Glr,r!) a ike ED eit (6.209) 
r 


which is a good approximation provided we are more than a few wavelengths 
away from the aperture, we find 


k etklr—r'| 
w(t’) & if —— (1+ cos 0)dS,. (6.210) 
aperture 


~ Ari lr — r’| 


Thus, each part of the wavefront on the surface AB acts as a source for the 
diffracted wave in 2. 

This result, although still an approximation, provides two substantial 
improvements to the naive form of Huygens’ construction as presented in 
elementary courses: 

i) There is factor of (1 + cos@) which suppresses backward propagating 
waves. The traditional exposition of Huygens construction takes no 
notice of which way the wave is going, and so provides no explanation 
as to why a wavefront does not act a source for a backward wave. 

ii) There is a factor of i~! = e~**/? which corrects a 90° error in the phase 
made by the naive Huygens construction. For two-dimensional slit 
geometry we must use the more complicated two-dimensional Green 
function (it is a Bessel function), and this provides an e~'*/* factor 
which corrects for the 45° phase error that is manifest in the Cornu 
spiral of Fresnel diffraction. 

For this reason the Kirchhoff approximation is widely used. 


Problem 6.12: Use the method of images to construct i) the Dirichlet, and 
ii) the Neumann, Green function for the region Q, consisting of everything to 
the right of the screen. Use your Green functions to write the solution to the 
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diffraction problem in this region a) in terms of the values of w on the aperture 
surface AB, b) in terms of the values of (n- V)w on the aperture surface. In 
each case, assume that the boundary data are identically zero on the dark side 
of the screen. Your expressions should coincide with the Rayleigh-Sommerfeld 
diffraction integrals of the first and second kind, respectively.? Explore the 
differences between the predictions of these two formule and that of Kirchhoff 
for case of the diffraction of a plane wave incident on the aperture from the 
left. 


6.6 Further exercises and problems 


Problem 6.13: Critical Mass. An infinite slab of fissile material has thickness 
L. The neutron density n(x) in the material obeys the equation 


where n(z,t) is zero at the surface of the slab at x = 0,L. Here, D is the 
neutron diffusion constant, the term An describes the creation of new neutrons 
by induced fission, and the constant jy is the rate of production per unit volume 
of neutrons by spontaneous fission. 


a) Expand n(a,t) as a series, 
n(z,t) = Ss" Am(t)Pm(x), 


where the y (x) are a complete set of functions you think suitable for 
solving the problem. 

b) Find an explicit expression for the coefficients a,,(t) in terms of their 
intial values a,,(0). 

c) Determine the critical thickness L¢,i, above which the slab will explode. 

d) Assuming that L < Lert, find the equilibrium distribution neg(x) of 
neutrons in the slab. (You may either sum your series expansion to get an 
explicit closed-form answer, or use another (Green function?) method.) 


Problem 6.14: Semi-infinite Rod. Consider the heat equation 
— =DV"0, 0<4<«@, 


with the temperature 0(z,t) obeying the initial condition @(2,0) = 69 for 
0 < x < ow, and the boundary condition 0(0,t) = 0. 


3M. Born and E. Wolf Principles of Optics 7th (expanded) edition, section 8.11. 
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a) Show that the boundary condition at x = 0 may be satisfied at all times 
by introducing a suitable mirror image of the initial data in the region 
—co < x < 0, and then applying the heat kernel for the entire real 
line to this extended initial data. Show that the resulting solution of the 
semi-infinite rod problem can be expressed in terms of the error function 


eo. 
erf (x) =e / e dé, 


6(x,t) = Oo erf (=) 


b) Solve the same problem by using a Fourier integral expansion in terms 
of sinka on the half-line 0 < x < oo and obtaining the time evolution 
of the Fourier coefficients. Invert the transform and show that your 
answer reduces to that of part a). (Hint: replace the initial condition by 
6(x,0) = Aoe ©”, so that the Fourier transform converges, and then take 
the limit « — 0 at the end of your calculation.) 


as 


Exercise 6.15: Seasonal Heat Waves. Suppose that the measured temperature 
of the air above the arctic permafrost at time t is expressed as a Fourier series 


(oe) 
O(t) =O0+ os 6, cos nwt, 
n=1 
where the period T = 27/w is one year. Solve the heat equation for the soil 
temperature, 
OO 070 
Ot Oe 
with this boundary condition, and find the temperature @(z,t) at a depth z 
below the surface as a function of time. Observe that the sub-surface temper- 
ature fluctuates with the same period as that of the air, but with a phase lag 
that depends on the depth. Also observe that the longest-period temperature 
fluctuations penetrate the deepest into the ground. (Hint: for each Fourier 
component, write 6 as Re[A,,(z) expinwt], where A, is a complex function of 


Zz) 


The next problem is an illustration of a Dirichlet principle. 


0<z<aw 


Exercise 6.16: Helmholtz-Hodge decomposition. Given a three-dimensional 
region Q with smooth boundary OQ, introduce the real Hilbert space L2,.(Q) 


of finite-norm vector fields, with inner product 


(u,v) = | u-vdx. 
Q 
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Consider the spaces L = {v : v = Vo} and T = {v: v = curl w} consisting 
of vector fields in L?,.(Q) that can can be written as gradients and curls, 
respectively. (Strictly speaking, we should consider the completions of these 


spaces. ) 


a) Show that if we demand that either (or both) of ¢ and the tangential 
component of w vanish on 02, then the two spaces £ and T are mutually 
orthogonal with respect to the the L2,,(Q) inner product. 


vec 
Let u € L2,.(Q). We will try to express u as the sum of a gradient and a curl 


by seeking to make the distance functional 
Fulé,w] = |lu— Vd — curl w/|? 
| ju — Vo — curl w|? dex 
Q 


equal to zero. 


b) Show that if we find a w and ¢ that minimize F\,[¢, w], then the residual 
vector field 
hu V¢-—curlw 
obeys curlh = O and divh = 0, together with boundary conditions 
determined by the constraints imposed on ¢ and w: 

i) If gis unconstrained on OQ, but the tangential boundary component 
of w is required to vanish, then the component of h normal to the 
boundary must be zero. 

ii) If 6 = 0 on OQ, but the tangential boundary component of w is 
unconstrained, then the tangential boundary component of h must 
be zero. 

iii) If 6 = 0 on OQ and also the tangential boundary component of w is 
required to vanish, then h need satisfy no boundary condition. 

c) Assuming that we can find suitable minimizing ¢ and w, deduce that 
under each of the three boundary conditions of the previous part, we 
have a Helmholtz-Hodge decomposition 


u=V¢+curlw+h 


into unique parts that are mutually L2,.(Q) orthogonal. Observe that 


vec 
the residual vector field h is harmonic — i.e. it satisfies the equation 
V7h = 0, where 


def 


V°h © V(div h) — curl (curl h) 


is the vector Laplacian acting on h. 
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If u is sufficiently smooth, there will exist ¢ and w that minimize the distance 
||u — Vd — curl w|| and satisfy the boundary conditions. Whether or not h is 
needed in the decomposition is another matter. It depends both on how we 
constrain @ and w, and on the topology of 9. At issue is whether or not the 
boundary conditions imposed on h are sufficient to force it to be zero. If Q 
is the interior of a torus, for example, then h can be non-zero whenever its 
tangential component is unconstrained. 


The Helmholtz-Hodge decomposition is closely related to the vector-field 
eigenvalue problems commonly met with in electromagnetism or elasticity. 
The next few exercises lead up to this connection. 


Exercise 6.17: Self-adjointness and the vector Laplacian. Consider the vector 
Laplacian (defined in the previous problem) as a linear operator on the Hilbert 
space L?,,(Q) . 


vec 


a) Show that 


[ @e{a- (9) -v-(Ww)} = {(n- u)divv — (n-v)divu 
Q 0Q 
—u-(n x curlv) +v- (nx curlu)} dS 


b) Deduce from the identity in part a) that the domain of V? coincides 
with the domain of (V?)', and hence the vector Laplacian defines a truly 
self-adjoint operator with a complete set of mutually orthogonal eigen- 
functions, when we take as boundary conditions one of the following: 

o) Dirichlet-Dirichlet: n-u =0 and n x u=0 on OQ, 

i) Dirichlet-Neumann: n-u = 0 and n x curlu = 0 on OO), 
ii) Neumann-Dirichlet: divu = 0 and n x u=0 on OF, 
iii) Neumann-Neumann: div u = 0 and n x curlu = 0 on O2. 

c) Show that the more general Robin boundary conditions 


a(n-u)+Gdivu = 0, 
A(n X u) + p(n x curlu) = 0, 


where @ (@, 4 v can be position dependent, also give rise to a truly self- 
adjoint operator. 


Problem 6.18: Cavity electrodynamics and the Hodge-Weyl decomposition. 
Each of the self-adjoint boundary conditions in the previous problem gives 
rise to a complete set of mutually orthogonal vector eigenfunctions obeying 


=V* uy = bay: 
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For these eigenfunctions to describe the normal modes of the electric field E 
and the magnetic field B (which we identify with H as we will use units in 
which fo = €9 = 1) within a cavity bounded by a perfect conductor, we need 
to additionally impose the Maxwell equations div B = div E = 0 everywhere 
within 2, and to satisfy the perfect-conductor boundary conditions n x E = 
n-B=0. 


a) For each eigenfunction u, corresponding to a non-zero eigenvalue k?, 
define Al 
Vn = pocurl (curlu,), Wn =—=>V(divu,), 
n 
so that u, = Vn + Wr. Show that v, and w,, are, if non-zero, each 
eigenfunctions of —V? with eigenvalue k2. The vector eigenfunctions 
that are not in the null-space of V? can therefore be decomposed into 
their transverse (the v,, which obey divv,, = 0) and longitudinal (the 
Wn, which obey curl w,, = 0) parts. However, it is not immediately clear 
what boundary conditions the v, and w,, separately obey. 
b) The boundary-value problems of relevance to electromagnetism are: 


i) —V°h, = k7hn, within Q, 
n-h, =0, nxcurlh, =0, on OQ; 
ii) —V7en = ken, within Q, 
dive, =0, nxe,=0, on OQ; 
7) —V?bn = k?2 bn, within Q, 
divb, =0, nxcurlb, =0, on OQ, 


These problems involve, respectively, the Dirichlet--Neumann, Neumann- 
Dirichlet, and Neumann-Neumann boundary conditions from the previ- 
ous problem. 

Show that the divergence-free transverse eigenfunctions 


e€ 1 e 1 e 1 
H, @ wo (curlh,), Ep, es gon (curle,), By “ gol (curl b,), 


obey n-H, = nx E, = nx curl By, = 0 on the boundary, and that from 
these and the eigenvalue equations we can deduce that n x curlH, = 
n-B, = n-curl E, = 0 on the boundary. The perfect-conductor boundary 
conditions are therefore satisfied. 

Also show that the corresponding longitudinal eigenfunctions 


def 1 def 1 def 1 
= pz AEN Bh) En = ge Y ven); By, = —=V(divb,) 


obey the boundary conditions n-7n, =n x €, =n x B, = 0. 
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c) By considering the counter-example provided by a rectangular box, show 
that the Dirichlet-Dirichlet boundary condition is not compatible with a 
longitudinal+transverse decomposition. (A purely transverse wave inci- 
dent on such a boundary will, on reflection, acquire a longitudinal com- 
ponent.) 

d) Show that 


0= | ny Hmdie= fen En dr = | 8, -Bm de. 
Q fo) Q 


but that the v,, and w,, obtained from the Dirichlet-Dirichlet boundary 
condition u,,’s are not in general orthogonal to each other. Use the 
continuity of the L2,.(Q) inner product 


vec 
Xn—7xX => (Xn,y) > (x,y) 


to show that this individual-eigenfunction orthogonality is retained by 
limits of sums of the eigenfunctions. Deduce that, for each of the bound- 
ary conditions i)-iii) (but not for the Dirichlet-Dirichlet case), we have 
the Hodge-Weyl decomposition of L2,.(Q) as the orthogonal direct sum 


vec 


Drec(Q) =LOT ON, 

where £, J are respectively the spaces of functions representable as in- 
finite sums of the longitudinal and transverse eigenfunctions, and NV is 
the finite-dimensional space of harmonic (nullspace) eigenfunctions. 


Complete sets of vector eigenfunctions for the interior of a rectangular box, 
and for each of the four sets of boundary conditions we have considered, can 
be found in Morse and Feshbach 813.1. 


Problem 6.19: Hodge- Weyl and Helmholtz-Hodge. In this exercise we consider 
the problem of what classes of vector-valued functions can be expanded in 
terms of the various families of eigenfunctions of the previous problem. It is 
tempting (but wrong) to think that we are restricted to expanding functions 
that obey the same boundary conditions as the eigenfunctions themselves. 
Thus, we might erroniously expect that the E,, are good only for expanding 
functions whose divergence vanishes and have vanishing tangential boundary 
components, or that the 7,, can expand out only curl-free vector fields with 
vanishing normal boundary component. That this supposition can be false 
was exposed in section 2.2.3, where we showed that functions that are zero at 
the endpoints of an interval can be used to expand out functions that are not 
zero there. The key point is that each of our four families of u, constitute 
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a complete orthonormal set in L?,,.(Q), and can therefore be used expand 


any vector field. As a consequence, the infinite sum )\a,B, € T can, for 
example, represent any vector-valued function u € L?,.(Q) provided only that 
u_ possesses no component lying either in the subspace £ of the longitudinal 


eigenfunctions ,,, or in the nullspace NV. 


a) Let T =< E, > be space of functions representable as infinite sums of 
the E,,. Show that 


<E, >+ = {u: curlu = 0 within Q, n x u=0 on OQ}. 


Find the corresponding perpendicular spaces for each of the other eight 
orthogonal decomposition spaces. 

b) Exploit your knowledge of < E, >+ acquired in part (a) to show that 
< E,, > itself is the Hilbert space 


< E, >= {u: divu=0 within 2, no condition on OO}. 


Similarly show that 


<é,> = {u:curlu=0 within 2, nx u=0on OO}, 
<n, > = {u:curlu=0 within 2, no condition on 00}, 
<H,> = {u:divu=0 within 2, n-u=0 on 00}, 
<B, > = {u:curlu=0 within 0, nx u=0 on OO}, 
<B,> = {u:divu=0 within 0, n-u=0 on OO}. 


c) Conclude from the previous part that any vector vector field u € L2,,(Q) 


vec 
can be uniquely decomposed as the L2,,(Q) orthogonal sum 


u=V¢+curlw +h, 


where V¢@ € L, curlw € T, and h € N, under each of the following sets 

of conditions: 

i) The scalar ¢ is unrestricted, but w obeys n x w = 0 on O02, and 
the harmonic h obeys n- h = 0 on OQ. (The condition on w makes 
curl w have vanishing normal boundary component.) 

ii) The scalar ¢ is zero on OQ, while w is unrestricted on 0Q. The 
harmonic h obeys n x h = 0 on OQ. (The condition on ¢ makes V¢ 
have zero tangential boundary component.) 

iii) The scalar ¢ is zero on OQ, the vector w obeys n x w = 0 on 
OQ, while the harmonic h requires no boundary condition. (The 
conditions on ¢ and w make V@¢ have zero tangential boundary 
component and curl w have vanishing normal boundary component.) 
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d) As an illustration of the practical distinctions between the decomposi- 
tions in part (c), take Q to be the unit cube in R?, and u = (1,0,0) a 
constant vector field. Show that with conditions (i) we have u € £, but 
for (ii) we have u € T, and for (iii) we have ue N. 


We see that the Hodge-Weyl decompositions of the eigenspaces correspond 
one-to-one with the Helmholtz-Hodge decompositions of problem 6.16. 


Chapter 7 


The Mathematics of Real 
Waves 


Waves are found everywhere in the physical world, but we often need more 
than the simple wave equation to understand them. The principal compli- 
cations are non-linearity and dispersion. In this chapter we will describe the 
mathematics lying behind some commonly observed, but still fascinating, 
phenomena. 


7.1 Dispersive waves 


In this section we will investigate the effects of dispersion, the dependence 
of the speed of propagation on the frequency of the wave. We will see that 
dispersion has a profound effect on the behaviour of a wave-packet. 


7.1.1 Ocean waves 


The most commonly seen dispersive waves are those on the surface of water. 
Although often used to illustrate wave motion in class demonstrations, these 
waves are not as simple as they seem. 

In chapter one we derived the equations governing the motion of water 
with a free surface. Now we will solve these equations. Recall that we 
described the flow by introducing a velocity potential @ such that, v = V@, 
and a variable h(x,t) which is the depth of the water at abscissa 2. 
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ay P 


Figure 7.1: Water with a free surface. 


Again looking back to chapter one, we see that the fluid motion is determined 
by imposing 


V7 =0 (7.1) 
everywhere in the bulk of the fluid, together with boundary conditions 
0 
~ = 0, on y=0, (7.2) 
0¢ lions 
a 4 5 (V9) +gy = 0, onthe free surface y = h, (7.3) 
Oh Of  OhO 
a 7 ae = 0, on the free surface y = h. (7.4) 


Recall the physical interpretation of these equations: The vanishing of the 
Laplacian of the velocity potential simply means that the bulk flow is incom- 
pressible 

divv = V’°¢=0. (25) 
The first two of the boundary conditions are also easy to interpret: The first 
says that no water escapes through the lower boundary at y = 0. The second, 
a form of Bernoulli’s equation, asserts that the free surface is everywhere at 
constant (atmospheric) pressure. The remaining boundary condition is more 
obscure. It states that a fluid particle initially on the surface stays on the 
surface. Remember that we set f(x, y,t) = h(x,t) — y, so the water surface 
is given by f(z,y,t) = 0. If the surface particles are carried with the flow 
then the convective derivative of f, 


Df act OF 


7 DE +(v-V)f, (7.6) 
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should vanish on the free surface. Using v = V@ and the definition of f, this 
reduces to ae, ee 
ah, 29h 26 _ 4 = 
Ot OxOx Oy 
which is indeed the last boundary condition. 

Using our knowledge of solutions of Laplace’s equation, we can immedi- 
ately write down a wave-like solution satisfying the boundary condition at 
y=0 

o(x,y, t) = acosh(ky) cos(ka — wt). (7.8) 


The tricky part is satisfying the remaining two boundary conditions. The 
difficulty is that they are non-linear, and so couple modes with different 
wave-numbers. We will circumvent the difficulty by restricting ourselves to 
small amplitude waves, for which the boundary conditions can be linearized. 
Suppressing all terms that contain a product of two or more small quantities, 
we are left with 


dh Oe 
Goa (7.10) 


Because ¢@ is a already a small quantity, and the wave amplitude is a small 
quantity, linearization requires that these equations should be imposed at 
the equilibrium surface of the fluid y = hg. It is convenient to eliminate h to 
get 
Po Oo | 
ae | Fay 


Inserting (7.8) into this boundary condition leads to the dispersion equation 


0, on y = ho. (7.11) 


w? = gk tanh kho, (7.12) 


relating the frequency to the wave-number. 
Two limiting cases are of particular interest: 
i) Long waves on shallow water: Here khg < 1, and, in this limit, 


w = kr/gho. 


ii) Waves on deep water: Here, kho > 1, leading to w = V/gk. 
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For deep water, the velocity potential becomes 

b(x,y,t) = ae) cos(ka — wt). (7.13) 
We see that the disturbance due to the surface wave dies away exponentially, 
and becomes very small only a few wavelengths below the surface. 


Remember that the velocity of the fluid is v = V@. To follow the motion 
of individual particles of fluid we must solve the equations 


oe = vz = —ake*y—ho) sin(kx — wt), 

dt 

d 

77 = vy= ake®Y-") cos(ka — wt). (7.14) 


This is a system of coupled non-linear differential equations, but to find 
the small amplitude motion of particles at the surface we may, to a first 
approximation, set x = x, y = ho on the right-hand side. The orbits of the 
surface particles are therefore approximately 


k 
a(t) = x2- — cos(kxo — wt), 


k 
y(t) = yor — sin(kao — wt). (7.15) 


Figure 7.2: Circular orbits in deep water surface waves. 


For right-moving waves, the particle orbits are clockwise circles. At the 
wave-crest the particles move in the direction of the wave propagation; in 
the troughs they move in the opposite direction. Figure 7.2 shows that this 
motion results in an up-down-asymmetric cycloidal wave profile. 

When the effect of the bottom becomes significant, the circular orbits 
deform into ellipses. For shallow water waves, the motion is principally back- 
and-forth with motion in the y direction almost negligeable. 
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7.1.2 Group velocity 


The most important effect of dispersion is that the group velocity of the waves 
— the speed at which a wave-packet travels — differs from the phase velocity 
— the speed at which individual wave-crests move. The group velocity is 
also the speed at which the energy associated with the waves travels. 

Suppose that we have waves with dispersion equation w = w(k). A right- 
going wave-packet of finite extent, and with initial profile y(a), can be Fourier 
analyzed to give 


gt) = I aR Aceye*. (7.16) 


AANA Nin - 
VY 


Figure 7.3: A right-going wavepacket. 
At later times this will evolve to 
g(e,t) = a dE A (fete tel (7.17) 
Let us suppose for the moment that A(k) is non-zero only for a narrow band 


of wavenumbers around kg, and that, restricted to this narrow band, we can 
approximate the full w(k) dispersion equation by 


w(k) © wo + U(k — ko). (7.18) 

Thus 2° dk 
Vat) = / a en Ve eerie, (7.19) 

ae, 2 


Comparing this with the Fourier expression for the initial profile, we find 
that 
p(x, t) = eMo-Ubo yt n(g — Ut). (7.20) 
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The pulse envelope therefore travels at speed U. This velocity 


Ow 
= 21 
U Fk (7.21) 


is the group velocity. The individual wave crests, on the other hand, move 
at the phase velocity w(k)/k. 

When the initial pulse contains a broad range of frequencies we can still 
explore its evolution. We make use of a powerful tool for estimating the be- 
havior of integrals that contain a large parameter. In this case the parameter 
is the time t. We begin by writing the Fourier representation of the wave as 


g(a, t)= a  alhjet) (7.22) 
where 6 
w(k) =k (=) —w(k). (7.23) 


Now look at the behaviour of this integral as t becomes large, but while we 
keep the ratio x/t fixed. Since ¢ is very large, any variation of w with k 
will make the integrand a very rapidly oscillating function of k. Cancellation 
between adjacent intervals with opposite phase will cause the net contribution 
from such a region of the k integration to be very small. The principal 
contribution will come from the neighbourhood of stationary phase points, 
i.e. points where 

dp x Ow 
— dk tt) Ok 
This means that, at points in space where x/t = U, we will only get contri- 
butions from the Fourier components with wave-number satisfying 


Ow 
eee 2 
U Ak (7.25) 


The initial packet will therefore spread out, with those components of the 


wave having wave-number k travelling at speed 


Ow 
Ugroup = Fy (7.26) 


0 (7.24) 


This is the same expression for the group velocity that we obtained in the 
narrow-band case. Again this speed of propagation should be contrasted 
with that of the wave-crests, which travel at 


W 
Uphase = ee (AQ) 
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The “stationary phase” argument may seem a little hand-waving, but it can 
be developed into a systematic approximation scheme. We will do this in 
chapter 19. 

Example: Water Waves. The dispersion equation for waves on deep water is 
w = /gk. The phase velocity is therefore 


nace rE (7.28) 
whilst the group velocity is 


1 /g 1 
Ugroup = WE = 3 Uphase- (7.29) 


This difference is easily demonstrated by tossing a stone into a pool and 
observing how individual wave-crests overtake the circular wave packet and 
die out at the leading edge, while new crests and troughs come into being at 
the rear and make their way to the front. 

This result can be extended to three dimensions with 


vi ae (7.30) 


group — Ok; 
a 


Example: de Broglie Waves. The plane-wave solutions of the time-dependent 
Schrodinger equation 


ov 


are 
w = Pa (7.32) 
with 
1 2 
w(k) = om* : (7.33) 


The group velocity is therefore 
Vegroup = a (7.34) 


which is the classical velocity of the particle. 


264 CHAPTER 7. THE MATHEMATICS OF REAL WAVES 


7.1.3 Wakes 


There are many circumstances when waves are excited by object moving at 
a constant velocity through a background medium, or by a stationary object 
immersed in a steady flow. The resulting wakes carry off energy, and therefore 
create wave drag. Wakes are involved, for example, in sonic booms, Cerenkov 
radiation, the Landau criterion for superfluidity, and Landau damping of 
plasma oscillations. Here, we will consider some simple water-wave analogues 
of these effects. The common principle for all wakes is that the resulting wave 
pattern is time independent when observed from the object exciting it. 
Example: Obstacle in a Stream. Consider a log lying submerged in a rapidly 
flowing stream. 


Figure 7.4: Log in a stream. 


The obstacle disturbs the water and generates a train of waves. If the log lies 
athwart the stream, the problem is essentially one-dimensional and easy to 
analyse. The essential point is that the distance of the wavecrests from the log 
does not change with time, and therefore the wavelength of the disturbance 
the log creates is selected by the condition that the phase velocity of the wave, 
coincide with the velocity of the mean flow.t The group velocity does come 
into play, however. If the group velocity of the waves is less that the phase 
velocity, the energy being deposited in the wave-train by the disturbance will 
be swept downstream, and the wake will lie behind the obstacle. If the group 
velocity is higher than the phase velocity, and this is the case with very short 
wavelength ripples on water where surface tension is more important than 
gravity, the energy will propagate against the flow, and so the ripples appear 
upstream of the obstacle. 


'™n his book Waves in Fluids, M. J. Lighthill quotes Robert Frost on this phenomenon: 


The black stream, catching on a sunken rock, 
Flung backward on itself in one white wave, 
And the white water rode the black forever, 
Not gaining but not losing. 
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Figure 7.5: Kelvin’s ship-wave construction. 


Example: Kelvin Ship Waves. A more subtle problem is the pattern of waves 
left behind by a ship on deep water. The shape of the pattern is determined 
by the group velocity for deep-water waves being one-half that of the phase 
velocity. 

How the wave pattern is formed can be understood from figure 7.5. In 
order that the pattern of wavecrests be time independent, the waves emitted 
in the direction AC must have phase velocity such that their crests travel 
from A to C while the ship goes from A to B. The crest of the wave emitted 
from the bow of the ship in the direction AC will therefore lie along the line 
BC — or at least there would be a wave crest on this line if the emitted 
wave energy travelled at the phase velocity. The angle at C must be a right 
angle because the direction of propagation is perpendicular to the wave- 
crests. Euclid, by virtue of his angle-in-a-semicircle theorem, now tells us 
that the locus of all possible points C (for all directions of wave emission) 
is the larger circle. Because, however, the wave energy only travels at one- 
half the phase velocity, the waves going in the direction AC actually have 
significant amplitude only on the smaller circle, which has half the radius of 
the larger. The wake therefore lies on, and within, the Kelvin wedge, whose 
boundary lies at an angle 6 to the ship’s path. This angle is determined by 
the ratio OD/OB=1/3 to be 


@=sin~ (1/3) = 19.5". (7.35) 


Remarkably, this angle, and hence the width of the wake, is independent of 
the speed of the ship. 
The waves actually on the edge of the wedge are usually the most promi- 
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Figure 7.6: Large-scale Kelvin wakes. (Image source: US Navy) 


nent, and they will have crests perpendicular to the line AD. This orientation 
is indicated on the left hand figure, and reproduced as the predicted pattern 
of wavecrests on the right. The prediction should be compared with the wave 
systems in figures 7.6 and 7.7. 


7.1.4 Hamilton’s theory of rays 


We have seen that wave packets travel at a frequency-dependent group ve- 
locity. We can extend this result to study the motion of waves in weakly 
inhomogeneous media, and so derive an analogy between the “geometric op- 
tics” limit of wave motion and classical dynamics. 

Consider a packet composed of a roughly uniform train of waves spread 
out over a region that is substantially longer and wider than their mean wave- 
length. The essential feature of such a wave train is that at any particular 
point of space and time, x and t, it has a definite phase O(x,t). Once we 
know this phase, we can define the local frequency w and wave-vector k by 


00 00 
a, Bele 36 
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Figure 7.7: Small-scale Kelvin wake. (Phograph by Fabrice Neyret) 


These definitions are motivated by the idea that 
O(x,t)~k-x-ut, (7.37) 


at least locally. 

We wish to understand how k changes as the wave propagates through a 
slowly varying medium. We introduce the inhomogeneity by assuming that 
the dispersion equation w = w(k), which is initially derived for a uniform 
medium, can be extended to w = w(k,x), where the x dependence arises, 
for example, as a result of a position-dependent refractive index. This as- 
sumption is only an approximation, but it is a good approximation when the 
distance over which the medium changes is much larger than the distance 
between wavecrests. 

Applying the equality of mixed partials to the definitions of k and w gives 


us 
—— = — —— — Ps 7.38 
Ca (Be). (se). ea oe 


The subscripts indicate what is being left fixed when we differentiate. We 
must be careful about this, because we want to use the dispersion equation 
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to express w as a function of k and x, and the wave-vector k will itself be a 
function of x and t. 
Taking this dependence into account, we write 


(a:),- (Ge), * Ge) Ge), 


We now use (7.38) to rewrite this as 


ea (se) (Ge), =- (=). (7.40) 


Interpreting the left hand side as a convective derivative 


dk; Ok; 
ak em pee -V)ki 
en (GE) +e Hh 
we read off that ae a 
Ww 
aes ee preieaee CA! 
provided we are moving at velocity 
dx; Ow 
as (vy)i = (5). (7.42) 


Since this is the group velocity, the packet of waves is actually travelling at 
this speed. The last two equations therefore tell us how the orientation and 
wavelength of the wave train evolve if we ride along with the packet as it is 
refracted by the inhomogeneity. 

The formulee 


: Ow 
vt ae 
Ow 
—— pail A 


are Hamilton’s ray equations. These Hamilton equations are identical in form 
to Hamilton’s equations for classical mechanics 


x= — (7.44) 
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except that k is playing the role of the canonical momentum, p, and w(k, x) 
replaces the Hamiltonian, H(p,x). This formal equivalence of geometric 
optics and classical mechanics was mystery in Hamilton’s time. Today we 
understand that classical mechanics is nothing but the geometric optics limit 
of wave mechanics. 


7.2 Making waves 


Many waves occurring in nature are generated by the energy of some steady 
flow being stolen away to drive an oscillatory motion. Familiar examples 
include the music of a flute and the waves raised on the surface of water by 
the wind. The latter process is quite subtle and was not understood until the 
work of J. W. Miles in 1957. Miles showed that in order to excite waves the 
wind speed has to vary with the height above the water, and that waves of 
a given wavelength take energy only from the wind at that height where the 
windspeed matches the phase velocity of the wave. The resulting resonant 
energy transfer turns out to have analogues in many branches of science. In 
this section we will exhibit this phenomenon in the simpler situation where 
the varying flow is that of the water itself. 


7.2.1 Rayleigh’s equation 


Consider water flowing in a shallow channel where friction forces keep the 
water in contact the stream-bed from moving. We will show that the resulting 
shear flow is unstable to the formation of waves on the water surface. The 
consequences of this instability are most often seen in a thin sheet of water 
running down the face of a dam. The sheet starts off flowing smoothly, but, 
as the water descends, waves form and break, and the water reaches the 
bottom in irregular pulses called roll waves. 


It is easiest to describe what is happening from the vantage of a reference 
frame that rides along with the surface water. In this frame the velocity 
profile of the flow will be as shown in figure 7.8. 
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Figure 7.8: The velocity profile U(y) in a frame at which the surface is at 
rest. 


Since the flow is incompressible but not irrotational, we will describe the 
motion by using a stream function V, in terms of which the fluid velocity is 
given by 

i; = =a", 

ty = OW. (7.45) 
This parameterization automatically satisfies V-v = 0, while the (z compo- 
nent of) the vorticity becomes 


Q = O,0y — Oyve = VW. (7.46) 
We will consider a stream function of the form? 


W(x, y,t) = vo(y) + vy ™, (7.47) 


where wo obeys —Oyo = vz = U(y), and describes the horizontal mean flow. 
The term containing w(y) represents a small-amplitude wave disturbance 
superposed on the mean flow. We will investigate whether this disturbance 
grows or decreases with time. 

Euler’s equation can be written as, 


2 
vewxo=-v (P42 4gy)=0. (7.48) 


Taking the curl of this, and taking into account the two dimensional character 
of the problem, we find that 


a,A+(v-V)Q=0. (7.49) 


?The physical stream function is, of course, the real part of this expression. 
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This, a general property of two-dimensional incompressible motion, says that 
vorticity is convected with the flow. We now express (7.49) in terms of W, 
when it becomes 


VU + (v- V)V20 = 0. (7.50) 


Substituting the expression (7.47) into (7.50), and keeping only terms of first 
order in w, gives 


a2 d? ; 
af (= = M) paiwe & E i) wb + ikvd,(—O,U) =0, 


or 
a . OU 1 

(ae) ¥- (Fe) woo?" an 
This is Rayleigh’s equation.? If only the first term were present, it would 
have solutions 7) « e**”, and we would have recovered the results of section 
7.1.1. The second term is significant, however. It will diverge if there is a 
point y. such that U(y.) = w/k. In other words, if there is a depth at which 
the flow speed coincides with the phase velocity of the wave disturbance, thus 
allowing a resonant interaction between the wave and flow. An actual infinity 
in (7.51) will be evaded, though, because w will gain a small imaginary part 
wW—>WRt+iy. A positive imaginary part means that the wave amplitude is 
growing exponentially with time. A negative imaginary part means that the 
wave is being damped. With y included, we then have 


i aH Y = a ae nee (7) §(U) ~ eel) 
— Wr/k Te 
= Tone tinsen 6 Ee eae 
(7.52) 


To specify the problem fully we need to impose boundary conditions on 
wv(y). On the lower surface we can set ~(0) = 0, as this will keep the fluid 
at rest there. On the upper surface y = h we apply Euler’s equation 


2 
vevx@=-V(P+ 5 toh) =0 (7.53) 


3Lord Rayleigh, “On the stability or instability of certain fluid motions.” Proc. Lond. 
Math. Soc. 11 (1880). 
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We observe that P is constant, being atmospheric pressure, and the v?/2 can 
be neglected as it is of second order in the disturbance. Then, considering 
the x component, we have 


t 2 
k; 
—-V,gh= -90. | uydt = —g (=) w (7.54) 
iw 
on the free surface. To lowest order we can apply the boundary condition on 
the equilibrium free surface y = yo. The boundary condition is therefore 


ldp kU FP 
oo a9 y = Yo. (7.55) 


We usually have OU /Oy = 0 near the surface, so this simplifies to 
——=g (7.56) 


That this is sensible can be confirmed by considering the case of waves on 
still, deep water, where q(y) = e!*¥. The boundary condition then reduces 
to |k| = gk?/w?, or w? = g\k|, which is the correct dispersion equation for 
such waves. 

We find the corresponding dispersion equation for waves on shallow flow- 
ing water by computing 


(7.57) 


from Rayleigh’s equation (7.51). Multiplying by ~* and integrating gives 


yo [@ PU 1 
= [Pay gee) e+ Ga) aay 8 


An integration by parts then gives 


aw yo _ Yo dw ; , eu 1 : 
c | -| ay {| E] +k [p|° + (Ss) rrr \. (7.59) 


The lower limit makes no contribution, since y* is zero there. On using (7.52) 
and taking the imaginary part, we find 
dy y OU 
Im { y— = sgn (2) x {| —— 
dy 4, k Oy? } y, 


Sh 


OU 


Yc 


or 
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OU |~* |b(y.)|? 


iia (-) Jae (2)*(S) ag 
v dy), k Oy? Jy, | Oy ly. Y(yo) |? 


This equation is most useful if the interaction with the flow does not sub- 
stantially perturb ~(y) away from the still-water result w(y) = sinh(|kly), 
and assuming this is so provides a reasonable first approximation. 

If we insert (7.61) into (7.56), where we approximate, 


BV cl RN pte 
g aye ~g we, ug ws, Y; 


(7.61) 


we find 


si (7) . Wr (5) OU - eb(ye) 

k/ 2gk? \ dy? 7, | Oy|,, |Y(yo)l? 
We see that either sign of y is allowed by our analysis. Thus the resonant 
interaction between the shear flow and wave appears to lead to either ex- 
ponential growth or damping of the wave. This is inevitable because our 
inviscid fluid contains no mechanism for dissipation, and its motion is neces- 
sarily time-reversal invariant. Nonetheless, as in our discussion of “friction 
without friction” in section 5.2.2, only one sign of y is actually observed. 
This sign is determined by the initial conditions, but a rigorous explanation 
of how this works mathematically is not easy, and is the subject of many 
papers. These show that the correct sign is given by 


see Wir (5) BU |~* bby)? 
2gk? \ dy? J,.| Oy |, Y(yo)/? 


Since our velocity profile has 0?U/Oy? < 0, this means that the waves grow 
in amplitude. 

We can also establish the correct sign for y by a computing the change of 
momentum in the background flow due to the wave.’ The crucial element is 
whether, in the neighbourhood of the critical depth, more fluid is overtaking 
the wave than lagging behind it. This is exactly what the the quantity 
0°U /Oy? measures. 


(7.62) 


(7.63) 


4G. E. Vekstein, “Landau resonance mechanism for plasma and wind-generated water 
waves,” American Journal of Physics, 66 (1998) 886-92. 
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7.3 Non-linear waves 


Non-linear effects become important when some dimensionless measure of 
the amplitude of the disturbance, say AP/P for a sound wave, or Ah/X for 
a water wave, is no longer < 1. 


7.3.1 Sound in air 


The simplest non-linear wave system is one-dimensional sound propagation 
in a gas. This problem was studied by Riemann. 

The one dimensional motion of a fluid is determined by the mass conser- 
vation equation 


Orp + O-(pv) = 0, (7.64) 


and Euler’s equation of motion 
p(O,u + vdzv) = —0,P. (7.65) 
In a fluid with equation of state P = P(p), the speed of sound, c, is given by 


»_ aP 


aap 


(7.66) 
It will in general depend on P, the speed of propagation being usually higher 
when the pressure is higher. 
Riemann was able to simplify these equations by defining a new thermo- 
dynamic variable 7(P) as 
P 
1 
n= [ <aP, (7.67) 
Py PS 
were Po is the equilibrium pressure of the undisturbed air. The quantity 7 
obeys 


dr if 
— =—. 7.68 
dP pc ( ) 
In terms of 7, Euler’s equation divided by p becomes 
O.v + vOz,0 + cO,T = 0, (7.69) 


whilst the equation of mass conservation divided by p/c becomes 


On + vOzT + cO,v = 0. (7.70) 
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At 


Figure 7.9: Non-linear characteristic curves. 


Adding and subtracting, we get Riemann’s equations 


(utr +(vt+c)d,(v+7) = 0, 
O(vu—7)+(v—c)d,(u—7) = 0. (7.71) 


These assert that the Riemann invariants v+7 are constant along the char- 
acteristic curves 


dx 
—=vt¢. 12 
yore (7.72) 


This tell us that signals travel at the speed vc. In other words, they travel, 
with respect to the fluid, at the local speed of sound c. Using the Riemann 
equations, we can propagate initial data u(x,t = 0), a(a,t = 0) into the 
future by using the method of characteristics. 

In figure 7.9 the value of v +7 is constant along the characteristic curve C’ si 
which is the solution of 


—=v+e (7.73) 


passing through A. The value of v — 7 is constant along C?, which is the 
solution of 


Oia (7.74) 


passing through B. Thus the values of 7 and v at the point P can be found if 
we know the initial values of v+ 7 at the point A and v — 7 at the point B. 
Having found v and a at P we can invert 7(P) to find the pressure P, and 
hence c, and so continue the characteristics into the future, as indicated by 
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Xx 


Figure 7.10: Simple wave characteristics. 


the dotted lines. We need, of course, to know v and c¢ at every point along 
the characteristics C4 and C® in order to construct them, and this requires 
us to to treat every point as a “P”. The values of the dynamical quantities 
at P therefore depend on the initial data at all points lying between A and 
B. This is the domain of dependence of P 


A sound wave caused by a localized excess of pressure will eventually 
break up into two distinct pulses, one going forwards and one going back- 
wards. Once these pulses are sufficiently separated that they no longer inter- 
act with one another they are simple waves. Consider a forward-going pulse 
propagating into undisturbed air. The backward characteristics are coming 
from the undisturbed region where both 7 and v are zero. Clearly 7 — v is 
zero everywhere on these characteristics, and so 7 = v. Now 7+v = 2u = 27 
is constant the forward characteristics, and so 7 and v are individually con- 
stant along them. Since 7 is constant, so is c. With v also being constant, 
this means that c + v is constant. In other words, for a simple wave, the 
characteristics are straight lines. 


This simple-wave simplification contains within it the seeds of its own 
destruction. Suppose we have a positive pressure pulse in a fluid whose 
speed of sound increases with the pressure. Figure 7.10 shows how, with 
this assumption, the straight-line characteristics travel faster in the high 
pressure region, and eventually catch up with and intersect the slower-moving 
characteristics. When this happens the dynamical variables will become 
multivalued. How do we deal with this? 
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7.3.2. Shocks 


Let us untangle the multivaluedness by drawing another set of pictures. Sup- 
pose u obeys the non-linear “half” wave equation 


(A, + wd,)u = 0. (7.75) 


The velocity of propagation of the wave is therefore u itself, so the parts of 
the wave with large u will overtake those with smaller u, and the wave will 
“break,” as shown in figure 7.11 
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Figure 7.11: A breaking non-linear wave. 


Physics does not permit such multivalued solutions, and what usually hap- 
pens is that the assumptions underlying the model which gave rise to the 
nonlinear equation will no longer be valid. New terms should be included in 
the equation which prevent the solution becoming multivalued, and instead 
a steep “shock” will form. 
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Figure 7.12: Formation of a shock. 
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Examples of an equation with such additional terms are Burgers’ equation 
(O, + ud; )u = vO?,u, (7.76) 


and the Korteweg de-Vries (KdV) equation (4.11), which, by a suitable rescal- 
ing of x and t, we can write as 

(0, + ud,)u = 603,,u. (7.77) 
Burgers’ equation, for example, can be thought of as including the effects of 
thermal conductivity, which was not included in the derivation of Riemann’s 
equations. In both these modified equations, the right hand side is negligeable 
when wu is varying slowly, but it completely changes the character of the 
solution when the waves steepen and try to break. 

Although these extra terms are essential for the stabilization of the shock, 
once we know that such a discontinuous solution has formed, we can find 
many of its properties — for example the propagation velocity — from general 
principles, without needing their detailed form. All we need is to know what 
conservation laws are applicable. 

Multiplying (0, + ud,)u = 0 by u"~1, we deduce that 


1 n i n+1 — 


and this implies that 
Ge -| de (7.79) 


(oe) 


is time independent. There are infinitely many of these conservation laws, 
one for each n. Suppose that the n-th conservation law continues to hold even 
in the presence of the shock, and that the discontinuity is at X(t). Then 


d X(t) fore) 
— / ie az + | ae = 0, (7.80) 
aa Ae ee X(t) 


This is equal to 


u® (X)X — uh (X)X + / oe 


—oo 


O,u" dx + i, Ot de 0. (7.81) 
X(t) 
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where u” (X) = u"(X—e) and ul. (X) = u"(X+e). Now, using (O,+ud,)u = 0 
in the regions away from the shock, where it is reliable, we can write this as 


. n X(t) n fore) 
ee | O,u" dx — : 0,u" dx 
nar 1 ay m+ 1 X(t) 
n n n 
= (- = 7 (ult? — yt), (7.82) 
The velocity at which the shock moves is therefore 
n+1  , ntl 
xy (7) (7.83) 
n+1/) (u® —u®) 


Since the shock can only move at one velocity, only one of the infinitely many 
conservation laws can continue to hold in the modified theory! 
Example: Burgers’ equation. From 


O, + ud, )u = vO?_u 7.84 
( t sy ave) ( ) 


we deduce that F 
Opu + Or ie — von} =); (7.85) 


so that Q; = f udz is conserved, but further investigation shows that no 
other conservation law survives. The shock speed is therefore 


2 2 


l(u,-—uz) 1 
pe) Sue 4) (7.86) 


Example: KdV equation. From 


0, + ud, )u = 603. u 7.87 
(0; By) ( ) 


LUX) 


we deduce that 


| 
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1 
Opu + Ox i —0é atu} 
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O; eu \ +O, {" Judy, + 50 (Aru) 
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where the dots refer to an infinite sequence of (not exactly obvious) conserva- 
tion laws. Since more than one conservation law survives, the KdV equation 
cannot have shock-like solutions. Instead, the steepening wave breaks up 
into a sequence of solitons. 

Example: Hydraulic Jump, or Bore 
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Figure 7.13: A Hydraulic Jump. 


A stationary hydraulic jump is a place in a stream where the fluid abruptly 
increases in depth from h, to hg, and simultaneously slows down from super- 
critical (faster than wave-speed) flow to subcritical (slower than wave-speed) 
flow. Such jumps are commonly seen near weirs, and white-water rapids.°> A 
circular hydraulic jump is easily created in your kitchen sink. The moving 
equivalent is the the tedal bore 

The equations governing uniform (meaning that v is independent of the 
depth) flow in channels are mass conservation 


Oh + Oz {hv} = 0, (7.88) 
and Euler’s equation 
Ojv + v0,0 = —Oz{ gh}. (7.89) 


We could manipulate these into the Riemann form, and work from there, but 
it is more direct to combine them to derive the momentum conservation law 


1 
{hv} +O, {ie + son} =, (7.90) 


From Euler’s equation, assuming steady flow, 0 = 0, we can also deduce 
Bernoulli’s equation 


1 
x + gh = const., (7.91) 


°The breaking crest of Frost’s “white wave” is probably as much as an example of a 
hydraulic jump as of a smooth downstream wake. 
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which is an energy conservation law. At the jump, mass and momentum 
must be conserved: 


hyvy = have, 


1 1 
hyve + 59h = hove + 59h, (7.92) 


and v2 may be eliminated to find 


pak (72) Uicisha): (7.93) 


A change of frame reveals that v; is the speed at which a wall of water of 
height h = (hy — h,) would propagate into stationary water of depth hj. 

Bernoulli’s equation is inconsistent with the two equations we have used, 
and so 


iL i) 
5M + gh, 4 52 + gho. (7.94) 


This means that energy is being dissipated: for strong jumps, the fluid down- 
stream is turbulent. For weaker jumps, the energy is radiated away in a train 
of waves — the so-called “undular bore”. 

Example: Shock Wave in Air: At a shock wave in air we have conservation 
of mass 


P1U1 = Povo, (7.95) 
and momentum 
piv; + Py = pave + Pr. (7.96) 
In this case, however, Bernoulli’s equation does hold,°, so we also have 
1 1 5 
me +h, = 52 + ho. (7.97) 


Here, h is the specific enthalpy (U + PV per unit mass). Entropy, though, is 
not conserved, so we cannot use PV = const. across the shock. From mass 


°Recall that enthalpy is conserved in a throttling process even in the presence of dissi- 
pation. Bernoulli’s equation for a gas is the generalization of this thermodynamic result to 
include the kinetic energy of the gas. The difference between the shock wave in air, where 
Bernoulli holds, and the hydraulic jump, where it does not, is that the enthalpy of the gas 
keeps track of the lost mechanical energy, which has been absorbed by the internal degrees 
of freedom. The Bernoulli equation for channel flow keeps track only of the mechanical 
energy of the mean flow. 
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and momentum conservation alone we find 

Py =P. 
v= (“) ata) (7.98) 
P2— P1 


For an ideal gas with c,/c, = y, we can use energy conservation to to elimi- 
nate the densities, and find 


(7.99) 
Here, co is the speed of sound in the undisturbed gas. 


7.3.3 Weak solutions 


We want to make mathematically precise the sense in which a function u 
with a discontinuity can be a solution to the differential equation 


l n il n+1 — 


even though the equation is surely meaningless if the functions to which the 
derivatives are being applied are not in fact differentiable. 

We could play around with distributions like the Heaviside step function 
or the Dirac delta, but this is unsafe for non-linear equations, because the 
product of two distributions is generally not meaningful. What we do is 
introduce a new concept. We say that wu is a weak solution to (7.100) if 


n+l 


| dx dt {uae + > wang} = 0, (7.101) 
R2 


for all test functions y in some suitable space J. This equation has formally 
been obtained from (7.100) by multiplying it by y(z,t), integrating over 
all space-time, and then integrating by parts to move the derivatives off wu, 
and onto the smooth function y. If u is assumed smooth then all these 
manipulations are legitimate and the new equation (7.101) contains no new 
information. A conventional solution to (7.100) is therefore also a weak 
solution. The new formulation (7.101), however, admits solutions in which u 
has shocks. 
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At 
X(t) 


ome, 


Figure 7.14: The geometry of the domains to the right and left of a jump. 


Let us see what is required of a weak solution if we assume that wu is 
everywhere smooth except for a single jump from u_(t) to u(t) at the point 
X(t). Let Ds be the regions to the left and right of the jump, as shown in 
figure 7.14. Then the weak-solution condition (7.101) becomes 


_ n n n+1 n n n+1 
o= f acat{u Op + ae\+ | andttu a a he ano. 
(7.102) 
Let 
n (7.103) 


7 1 —X 

Gitex? afl- xX? 
be the unit outward normal to D_, then, using the divergence theorem, we 
have 


n n n+1 n n n+1 
= dt < — —.0, 
[evar dup + su axes [iw a # (2 + Oot )} 
_¥ n n n+1 
+f ato A ag )} 


Here we have written the integration measure over the boundary as 


ds = \/1+|X|? dt. (7.105) 


Performing the same manoeuvre for D,, and observing that y can be any 
smooth function, we deduce that 
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i) Ou" + 2, 0,u"™ = 0 within Dy. 

ii) X(u% —u") = is (unt? — u™t!) on X(t). 
The reasoning here is identical to that in chapter one, where we considered 
variations at endpoints to obtain natural boundary conditions. We therefore 
end up with the same equations for the motion of the shock as before. 

The notion of weak solutions is widely used in applied mathematics, and it 
is the principal ingredient of the finite element method of numerical analysis 
in continuum dynamics. 


7.4 Solitons 


A localized disturbance in a dispersive medium soon falls apart, since its 
various frequency components travel at differing speeds. At the same time, 
non-linear effects will distort the wave profile. In some systems, however, 
these effects of dispersion and non-linearity can compensate each other and 
give rise to solitons—stable solitary waves which propagate for long distances 
without changing their form. Not all equations possessing wave-like solutions 
also possess solitary wave solutions. The best known example of equations 
that do, are: 
1) The Korteweg-de-Vries (KdV) equation, which in the form 


Ou Ou Ou 
Be ae = pee (7.106) 


has a solitary wave solution 
u = 2a°sech?(ax — at) (7.107) 


which travels at speed a?. The larger the amplitude, therefore, the 
faster the solitary wave travels. This equation applies to steep waves 
in shallow water. 

2) The non-linear Shrédinger (NLS) equation with attractive interactions 


Ob 1 


where \ > 0. It has solitary-wave solution 


Caer —sech Vale — Ut), (7.109) 
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where 
k=mU, w= =mu?-—. 
, 2 2m 
In this case, the speed is independent of the amplitude, and the moving 
solution can be obtained from a stationary one by means of a Galilean 
boost. The nonlinear equation for the stationary wavepacket may be 
solved by observing that 


(7.110) 


(—0? — 2sech?x) 9 = —p (7.111) 


where w(x) = sechax. This is the bound-state of the Pdéschel-Teller 
equation that we have met several times before. The non-linear Schrodinger 
equation describes many systems, including the dynamics of tornadoes, 
where the solitons manifest as the knot-like kinks sometimes seen wind- 
ing their way up thin funnel clouds.’ 
3) The sine-Gordon (SG) equation is 
fp Cp m 


ape gee + e sin by = 0. (7.112) 


This has solitary-wave solutions 


4 
Cet) = 3 tan”? fer ; (7.113) 


where y = (1 — U?)~2 and |U| < 1. The velocity is not related to the 
amplitude, and the moving soliton can again be obtained by boosting 
a stationary soliton. The boost is now a Lorentz transformation, and 
so we only get subluminal solitons, whose width is Lorentz contracted 
by the usual relativistic factor of y. The sine-Gordon equation de- 
scribes, for example, the evolution of light pulses whose frequency is in 
resonance with an atomic transition in the propagation medium.® 
In the case of the sine-Gordon soliton, the origin of the solitary wave is 
particularly easy to understand, as it can be realized as a “twist” in a chain 
of coupled pendulums. The handedness of the twist determines whether we 
take the + or — sign in the solution (7.113). 


"H.Hasimoto, J. Fluid Mech. 51 (1972) 477-485. 
8See G. L. Lamb, Rev. Mod. Phys. 43(1971) 99, for a nice review. 
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y 
WZ 


Figure 7.15: A sine-Gordon solitary wave as a twist in a ribbon of coupled 
pendulums. 


The existence of solitary-wave solutions is interesting in its own right. 
It was the fortuitous observation of such a wave by John Scott Russell on 
the Union Canal, near Hermiston in England, that founded the subject.? 
Even more remarkable was Scott Russell’s subsequent discovery (made in a 
specially constructed trough in his garden) of what is now called the soliton 
property: two colliding solitary waves interact in a complicated manner yet 
emerge from the encounter with their form unchanged, having suffered no 
more than a slight time delay. Each of the three equations given above has 
exact multi-soliton solutions which show this phenomenon. 

After languishing for more than a century, soliton theory has grown to 
be a huge subject. It is, for example, studied by electrical engineers who 
use soliton pulses in fibre-optic communications. No other type of signal 
can propagate though thousands of kilometers of undersea cable without 
degradation. Solitons, or “quantum lumps” are also important in particle 
physics. The nucleon can be thought of as a knotted soliton (in this case 
called a “skyrmion”) in the pion field, and gauge-field monopole solitons 


9“T was observing the motion of a boat which was rapidly drawn along a narrow channel 
by a pair of horses, when the boat suddenly stopped - not so the mass of water in the 
channel which it had put in motion; it accumulated round the prow of the vessel in a state 
of violent agitation, then suddenly leaving it behind, rolled forward with great velocity, 
assuming the form of a large solitary elevation, a rounded, smooth and well-defined heap 
of water, which continued its course along the channel apparently without change of form 
or diminution of speed. I followed it on horseback, and overtook it still rolling on at a rate 
of some eight or nine miles an hour, preserving its original figure some thirty feet long and 
a foot to a foot and a half in height. Its height gradually diminished, and after a chase of 
one or two miles I lost it in the windings of the channel. Such, in the month of August 
1834, was my first chance interview with that singular and beautiful phenomenon which I 
have called the Wave of Translation.” —John Scott Russell, 1844. 
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appear in many string and field theories. The soliton equations themselves 
are aristocrats among partial differential equations, with ties into almost 
every other branch of mathematics. 

Practical Illustration: Solitons in Optical Fibres. We wish to transmit pi- 
cosecond pulses of light with a carrier frequency wg. Suppose that the dis- 
persive properties of the fibre are such that the associated wavenumber for 
frequencies near wo can be expanded as 


1 
k = Ak + ko + iw — wo) + 592(w — wo)” +> (7.114) 


Here, (3, is the reciprocal of the group velocity, and (2 is a parameter called 
the group velocity dispersion (GVD). The term Ak parameterizes the change 
in refractive index due to non-linear effects. It is proportional to the mean- 
square of the electric field. Let us write the electric field as 


Elgt) > AG hee, (7.115) 


where A(z, t) is a slowly varying envelope function. When we transform from 
Fourier variables to space and time we have 


(w — WwW) > ic (k — ko) > “ic, (7.116) 
and so the equation determining A becomes 
OA .. 0A B,0°7A 
Oe = aE” 2 
If we set Ak = y|A?|, where ¥ is normally positive, we have 


(0A  ,dA\ A ' 


We may get rid of the first-order time derivative by transforming to a frame 
moving at the group velocity. We do this by setting 

Ti be A1z, 

C= 2 (7.119) 


+ AKA. (7.117) 


and using the chain rule, as we did for the Galilean transformation in home- 
work set 0. The equation for A ends up being 
OA — 2, 0°A 


OR PEO vo 2 
"Be > O72 q|Al°A. (7.120) 
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This looks like our non-linear Schrodinger equation, but with the role of 
space and time interchanged! Also, the coefficient of the second derivative 
has the wrong sign so, to make it coincide with the Schrodinger equation we 
studied earlier, we must have G2 < 0. When this condition holds, we are 
said to be in the “anomalous dispersion” regime — although this is rather 
a misnomer since it is the group refractive index, Ng = C/Vgroup, that is 
decreasing with frequency, not the ordinary refractive index. For pure SiO, 
glass, 32 is negative for wavelengths greater than 1.27 um. We therefore have 
anomalous dispersion in the technologically important region near 1.55 jm, 
where the glass is most transparent. In the anomalous dispersion regime we 


have solitons with 
Ar) = peolealey? | sech Ja(r), (7.121) 


leading to 


E(z,t)=4] ee Vat — By z)e%!F2l2/? eikoz—twot, (7.122) 
Y 


This equation describes a pulse propagating at 3;', which is the group ve- 
locity. 


Exercise 7.1: Find the expression for the sine-Gordon soliton, by first showing 
that the static sine-Gordon equation 


Op m? | 
Sp pe 


implies that 
1 72 m2 
2° 
and solving this equation (for a suitable choice of the constant) by separation 
of variables. Next, show that if f(x) is solution of the static equation, then 
f(y(a — Ut)), y = (1 — U?)-1/?, |U| < 1 is a solution of the time-dependent 
equation. 


cos By = const., 


Exercise 7.2: Lax pair for the non-linear Schrédinger equation. Let L be the 
matrix differential operator 
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and let P be the matrix 
ilxl? —x’* | 
P= ; : 
—x' —ilx/? 


Show that the equation 


f= lL? | 
is equivalent to the non-linear Shrodinger equation 


ix = —-x" — 2lx/?x. 


7.5 Further exercises and problems 


Here are some further problems on non-linear and dispersive waves: 


Problem 7.3: The Equation of Telegraphy. Oliver Heaviside’s equations re- 
lating the voltage u(x,t) and current i(z,t) in a transmission line are 


O1 : Ov 
te, T= ae 
Ov Oi 
Copp es St Ga 


Here R, C, L and G are respectively the resitance, capacitance, inductance, 
and leakance of each unit length of the line. 


a) Show that Heaviside’s equations lead to u(x,t) obeying 


Ou Ov d?u 
LC + (L — a 
BE Sat | G+ RC) + RGU= 55) 
and also to a similar equation for i(z, t). 
b) Seek a travelling-wave solution of the form 
v0 eilke—wt) 
i(ka—wt) 


S 
— 
8 

Hn 

I 


Cet) SS 4g 


and find the dispersion equation relating w and k. From this relation, 
show that signals propagate undistorted (i.e. with frequency-independent 
attenuation) at speed 1/LC provided that the Heaviside condition 
RC = LG is satisfied. 
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c) Show that the characteristic impedance Z = vo/ig of the transmission 


line is given by 
R+iwl 
cba CEMENTS 


Deduce that the characteristic impedance is frequency independent if the 
Heaviside condition is satisfied. 


In practical applications, the Heaviside condition can be satisfied by periodi- 
cally inserting extra inductors—known as loading coils—into the line. 


Problem 7.4: Pantograph Drag. A high-speed train picks up its electrical 
power via a pantograph from an overhead line. The locomotive travels at 
speed U and the pantograph exerts a constant vertical force F’' on the power 
line. 


A i el sce ican tS p§-X\. 
(OXOL! iit Mc iGonl= i Teo_ 


Figure 7.16: A high-speed train. 


We make the usual small amplitude approximations and assume (not unrealis- 
tically) that the line is supported in such a way that its vertical displacement 
obeys an inhomogeneous Klein-Gordon equation 


pij — Ty" + py = F6(x — Ut), 


with c = \/T/p, the velocity of propagation of short-wavelength transverse 
waves on the overhead cable. 


a) Assume that U < c and solve for the steady state displacement of the 
cable about the pickup point. (Hint: the disturbance is time-independent 
when viewed from the train.) 

b) Now assume that U > c. Again find an expression for the displacement 
of the cable. (The same hint applies, but the physically appropriate 
boundary conditions are very different!) 

c) By equating the rate at which wave-energy 


1 oe eet 
E= ~ py? + =Ty” + py? d 
[ {50 + 5Ty + seQly ¢ da 
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is being created to rate at the which the locomotive is doing work, cal- 
culate the wave-drag on the train. In particular, show that there is no 
drag at all until U exceeds c. (Hint: While the front end of the wake is 
moving at speed U, the trailing end of the wake is moving forward at the 
group velocity of the wave-train.) 

d) By carefully considering the force the pantograph exerts on the overhead 
cable, again calculate the induced drag. You should get the same answer 
as in part c) (Hint: To the order needed for the calculation, the tension 
in the cable is the same before and after the train has passed, but the 
direction in which the tension acts is different. The force F’ is therefore 
not exactly vertical, but has a small forward component. Don’t forget 
that the resultant of the forces is accelerating the cable.) 


This problem of wake formation and drag is related both to Cerenkov radiation 


and to the Landau criterion for superfluidity. 


Exercise 7.5: Inertial waves. A rotating tank of incompressible (g = 1) fluid 
can host waves whose restoring force is provided by angular momentum con- 
servation. Suppose the fluid velocity at the point r is given by 


v(r,t) =u(r,t) +x, 


where u is a perturbation imposed on the rigid rotation of the fluid at angular 
velocity Q. 


a) Show that when viewed from a co-ordinate frame rotating with the fluid 


we have 2 ‘ 
u u 
—=|(—- QD -V 
Bm (Gen XE (OX)-V)a) 
Deduce that the lab-frame Euler equation 
~ +(v-V)v =—-VP, 
becomes, in the rotating frame, 


O 1 

"42(@ x u) + (u-V)u=—-V (P—=|9xr?). 

Ot 2 

We see that in the non-inertial rotating frame the fluid experiences a 
—2(Q x u) Coriolis and a V|Q x r|?/2 centrifugal force. By linearizing 
the rotating-frame Euler equation, show that for small u we have 


where w = curl u. 


202 


b) 
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Take Q to be directed along the z axis. Seek plane-wave solutions to « 
in the form 


u(r, t) = uncer) 


where ug is a constant, and show that the dispersion equation for these 
small amplitude inertial waves is 


k2 
ee ees 
= 2+ ke +e? 


Deduce that the group velocity is directed perpendicular to k— i.e. at 
right-angles to the phase velocity. Conclude also that any slow flow 
that is steady (time independent) when viewed from the rotating frame 
is necessarily independent of the co-ordinate z. (This is the origin of 
the phenomenon of Taylor columns, which are columns of stagnant fluid 
lying above and below any obstacle immersed in such a flow.) 


Exercise 7.6: Non-linear Waves. In this problem we will explore the Riemann 
invariants for a fluid with P = \7p3/3. This is the equation of state of one- 
dimensional non-interacting Fermi gas. 


a) 


From the continuity equation 
Op + Orpv = 0, 
and Euler’s equation of motion 
p(Opv + vdzv) = —O,P, 


deduce that 


(F++02) (Aptv) = 0, 


I 
2 


(5 +(-Apt+ oz) (—Ap+v) 


In what limit do these equations become equivalent to the wave equation 
for one-dimensional sound? What is the sound speed in this case? 
Show that the Riemann invariants v+Ap are constant on suitably defined 
characteristic curves. What is the local speed of propagation of the waves 
moving to the right or left? 

The fluid starts from rest, v = 0, but with a region where the density 
is higher than elsewhere. Show that that the Riemann equations will 
inevitably break down at some later time due to the formation of shock 
waves. 
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Exercise 7.7: Burgers Shocks. As simple mathematical model for the forma- 
tion and decay of a shock wave consider Burgers’ Equation: 


O;u + UOzu = V Ou. 


Note its similarity to the Riemann equations of the previous exercise. The 
additional term on the right-hand side introduces dissipation and prevents the 
solution becoming multi-valued. 


a) Show that if vy = 0 any solution of Burgers’ equation having a region 
where u decreases to the right will always eventually become multivalued. 

b) Show that the Hopf-Cole transformation, u = —2v 0,Inw, leads to w 
obeying a heat diffusion equation 


On) = vy Oy). 


c) Show that 
W(x t) = Aer tax a7 Bev tbe 


is a solution of this heat equation, and so deduce that Burgers’ equation 
has a shock-wave-like solution which travels to the right at speed C = 
v(a+b) = $(uz +up), the mean of the wave speeds to the left and right 
of the shock. Show that the width of the shock is © 4v/|uz — url. 
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Chapter 8 


Special Functions 


In solving Laplace’s equation by the method of separation of variables we 
come across the most important of the special functions of mathematical 
physics. These functions have been studied for many years, and books such as 
the Bateman manuscript project! summarize the results. Any serious student 
theoretical physics needs to be familiar with this material, and should at least 
read the standard text: A Course of Modern Analysis by E. T. Whittaker 
and G. N. Watson (Cambridge University Press). Although it was originally 
published in 1902, nothing has superseded this book in its accessibility and 
usefulness. 

In this chapter we will focus only on the properties that all physics stu- 
dents should know by heart. 


8.1 Curvilinear co-ordinates 


Laplace’s equation can be separated in a number of coordinate systems. 
These are all orthogonal systems in that the local coordinate axes cross at 
right angles. 


!The Bateman manuscript project contains the formule collected by Harry Bateman, 
who was professor of Mathematics, Theoretical Physics, and Aeronautics at the California 
Institute of Technology. After his death in 1946, several dozen shoe boxes full of file cards 
were found in his garage. These proved to be the index to a mountain of paper contain- 
ing his detailed notes. A subset of the material was eventually published as the three 
volume series Higher Transcendental Functions, and the two volume Tables of Integral 
Transformations, A. Erdélyi et al. eds. 
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To any system of orthogonal curvilinear coordinates is associated a metric 
of the form 


ds? = h2(dx')? + h3(dx?)? + h2(dx*)?. (8.1) 


This expression tells us the distance Vds? between the adjacent points 
(x1 + dx}, x? + dx, x* + dx?) and (z!, x”, x3). In general, the h,; will depend 
on the co-ordinates 2’. 

The most commonly used orthogonal curvilinear co-ordinate systems are 
plane polars, spherical polars, and cylindrical polars. The Laplacian also 
separates in plane elliptic, or three-dimensional ellipsoidal coordinates and 
their degenerate limits, such as parabolic cylindrical co-ordinates — but these 
are not so often encountered, and for their properties we refer the reader to 


comprehensive treatises such as Morse and Feshbach’s Methods of Theoretical 
Physics. 


Plane polar co-ordinates 


Figure 8.1: Plane polar co-ordinates. 


Plane polar co-ordinates have metric 
ds* = dr* + r7d6, (8.2) 


6h. = 1, he =F: 
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Spherical polar co-ordinates 


Figure 8.2: Spherical co-ordinates. 


This system has metric 
ds? = dr? + r°d6? + r? sin? 6d¢’, (8.3) 


soh,=1,hg=r,hg=rsiné, 


Cylindrical polar co-ordinates 


Figure 8.3: Cylindrical co-ordinates. 


These have metric 
ds? = dr? + r7d6? + dz’, (8.4) 
so ig = 1, he =r hg Hl. 
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8.1.1 Div, grad and curl in curvilinear co-ordinates 


It is very useful to know how to write the curvilinear co-ordinate expressions 
for the common operations of the vector calculus. Knowing these, we can 
then write down the expression for the Laplace operator. 


The gradient operator 


We begin with the gradient operator. This is a vector quantity, and to 
express it we need to understand how to associate a set of basis vectors with 
our co-ordinate system. The simplest thing to do is to take unit vectors e; 
tangential to the local co-ordinate axes. Because the coordinate system is 
orthogonal, these unit vectors will then constitute an orthonormal system. 


Figure 8.4: Unit basis vectors in plane polar co-ordinates. 


The vector corresponding to an infinitesimal co-ordinate displacement dz’ is 
then given by 
dr = hidx'e; + hodx*es + h3dx%es. (8.5) 


Using the orthonormality of the basis vectors, we find that 
ds” = |dr|? = h?(da')? + h3(da?)? + h2(dz*)’, (8.6) 


as before. 
In the unit-vector basis, the gradient vector is 


__. 1 (a 1 (ao 1 (a¢ 
grad @ — Voz= ha (=) e; + i (2) €2 + is (=*) 3, (8.7) 


so that ae ae ae 
<n, LO al ee 
(grad ¢) - dr 7at + a —sda” + 5a da (8.8) 


which is the change in the value @ due the displacement. 
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The numbers (h,dx!, hgdx?, h3dx?) are often called the physical compo- 
nents of the displacement dr, to distinguish them from the numbers (dz, dx?, dx*) 
which are the co-ordinate components of dr. The physical components of a 
displacement vector all have the dimensions of length. The co-ordinate com- 
ponents may have different dimensions and units for each component. In 
plane polar co-ordinates, for example, the units will be meters and radians. 
This distinction extends to the gradient itself: the co-ordinate components 
of an electric field expressed in polar co-ordinates will have units of volts 
per meter and volts per radian for the radial and angular components, re- 
spectively. The factor 1/hg = r~' serves to convert the latter to volts per 
meter. 


The divergence 


The divergence of a vector field A is defined to be the flux of A out of an 
infinitesimal region, divided by volume of the region. 


Figure 8.5: Flux out of an infinitesimal volume with sides of length h,dz', 
hodx?, h3dx? : 


In the figure, the flux out of the two end faces is 
dx*dx? [Ai hohs|(o1-4de1,02,23) = Ayhghs| (a1,22,23) | ~y dx‘ dx?dx? 


(8.9) 
Adding the contributions from the other two pairs of faces, and dividing by 
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the volume, hahyhgdx'dx7dx?, gives 


hyhoh3 Ox, Ox Ox 


Note that in curvilinear coordinates div A is no longer simply V-A, although 
one often writes it as such. 


1 0 0 0 
div A = {pop tated) + (hy h3Az) + (hihaAa)} 7 (8.10) 


The curl 


The curl of a vector field A is a vector whose component in the direction of 
the normal to an infinitesimal area element, is line integral of A round the 
infinitesimal area, divided by the area. 


h dx! 
1 


Figure 8.6: Line integral round infinitesimal area with sides of length hid’, 
hodx*, and normal eg . 


The third component is, for example, 
ae 
hihy \ Ox! Ox2 )- 
The other two components are found by cyclically permuting 1 — 2 — 3 — 1 
in this formula. The curl is thus is no longer equal to V x A, although it is 


common to write it as if it were. 
Note that the factors of h; are disposed so that the vector identities 


curl grad y = 0, (8.12) 


(curl A) (8.11) 


and 
div curl A = 0, (8.13) 


continue to hold for any scalar field y, and any vector field A. 
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8.1.2 The Laplacian in curvilinear co-ordinates 


The Laplacian acting on scalars, is “div grad”, and is therefore 


vip = 1 {2 (hahs de) , 2 (Ihy AP), 9 (ihe dp 
ae hyho2h3 Or, hy Or, Ox ho Ox 0x3 hg 0x3 ; 


(8.14) 


This formula is worth committing to memory. 
When the Laplacian is to act on a vector field, we must use the vector 
Laplacian 


V’A = erad div A — curlcurl A. (8.15) 


In curvilinear co-ordinates this is no longer equivalent to the Laplacian acting 
on each component of A, treating it as if it were a scalar. The expression 
(8.15) is the appropriate generalization of the vector Laplacian to curvilinear 
co-ordinates because it is defined in terms of the co-ordinate independent 
operators div, grad, and curl, and reduces to the Laplacian on the individual 
components when the co-ordinate system is Cartesan. 

In spherical polars the Laplace operator acting on the scalar field y is 


die BOONE BOON, ot Oa GOR\ yt Oe 
Ve r? Or "Or a0 Ob aaa Tea Oe? 


1@(ry) 1 j as am Se 6) 1 &y 
or Or? + {sam (50S) +e} 
18(ryp) 7? 
a oes oa 
where 
2 
Pee (8.17) 


Bs ee Pe 
sind 00. 00 sin? 6 0¢?’ 


is (after multiplication by h?) the operator representing the square of the 
angular momentum in quantum mechanics. 
In cylindrical polars the Laplacian is 


, 100,18  @& 


——T— 


poe OF OP OF ene) 
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8.2 Spherical harmonics 
We saw that Laplace’s equation in spherical polars is 


1&2(ry) LP? 


To solve this by the method of separation of variables, we factorize 


p= R(r)Y (0, 4), (8.20) 
so that 1 rR) 11 
f eh 
ee (Fz y) =): (8.21) 
Taking the separation constant to be /(/ + 1), we have 
d@(rR 
pualgt) —1(1+1)(rR) =0, (8.22) 
dr? 
and ; 
PY =I(1+ DY. (8.23) 


The solution for R is r! or r~~". The equation for Y can be further decom- 
posed by setting Y = 0(0)6(¢). Looking back at the definition of L?, we see 
that we can take 


®(¢) =e? (8.24) 
with m an integer to ensure single valuedness. The equation for O is then 
1 od dO m 
— — | sind— } — —,-0 = -/1(1+ 1)0. 2 
sin 0 dé (sino dé ) sin? 9° ES 29) 
It is convenient to set x = cos; then 
d d m 
Sage) 1)-—_ )@=0. 2 
(= x )7 t+ + ) ak 0 (8.26) 


8.2.1 Legendre polynomials 
We first look at the axially symmetric case where m = 0. We are left with 


d a d = 
(0 —2£ ae +10+ 0) 6 = 0. (8.27) 
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This is Legendre’s equation. We can think of it as an eigenvalue problem 


— (<u _ 5) O(x) =1(1+ 1)0(z), (8.28) 
on the interval —1 < x < 1, this being the range of cos @ for real 6. Legendre’s 
equation is of Sturm-Liouville form, but with regular singular points at 7 = 
+1. Because the endpoints of the interval are singular, we cannot impose 
as boundary conditions that ©, 0’, or some linear combination of these, be 
zero there. We do need some boundary conditions, however, so as to have a 
self-adjoint operator and a complete set of eigenfunctions. 

Given one or more singular endpoints, a possible route to a well-defined 
eigenvalue problem is to require solutions to be square-integrable, and so 
normalizable. This condition suffices for the harmonic-oscillator Schrodinger 
equation, for example, because at most one of the two solutions is square- 
integrable. For Legendre’s equation with | = 0, the two independent solutions 
are O(x) = 1 and O(x) = In(1+2) —In(1— 2). Both of these solutions have 
finite L?[—1, 1] norms, and this square integrability persists for all values of 
1. Thus, demanding normalizability is not enough to select a unique bound- 
ary condition. Instead, each endpoint possesses a one-parameter family of 
boundary conditions that lead to self-adjoint operators. We therefore make 
the more restrictive demand that the allowed eigenfunctions be finite at the 
endpoints. Because the the north and south pole of the sphere are not special 
points, this is a physically reasonable condition. When / is an integer, then 
one of the solutions, P;(2), becomes a polynomial, and so is finite at x = +1. 
The second solution Q/(x) is divergent at both ends, and so is not an allowed 
solution. When / is not an integer, neither solution is finite. The eigenvalues 
are therefore [(1 + 1) with | zero or a positive integer. Despite its unfa- 
miliar form, the “finite”? boundary condition makes the Legendre operator 
self-adjoint, and the Legendre polynomials P;(x) form a complete orthogonal 
set for L?[—1, 1]. 

Proving orthogonality is easy: we follow the usual strategy for Sturm- 
Liouville equations with non-singular boundary conditions to deduce that 


[1d +1) — m(m + 1)] i‘ P(t) Pm(x) dx = [(P,P), — P/Pm)(1 — 27)] 


-1 


i 
—] ; 

(8.29) 
Since the P,’s remain finite at +1, the right hand side is zero because of the 
(1 — x”) factor, and so ink P,(x)Pm(x) dx is zero if 1 #4 m. (Observe that 
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this last step differs from the usual argument where it is the vanishing of the 
eigenfunction or its derivative that makes the integrated-out term zero.) 

Because they are orthogonal polynomials, the P;(z) can be obtained by 
applying the Gram-Schmidt procedure to the sequence 1, x, x?,... to obtain 
polynomials orthogonal with respect to the w = 1 inner product, and then 
fixing the normalization constant. The result of this process can be expressed 
in closed form as 


dh Za 
P(x) = Sar Ga =1)), (8.30) 


This is called Rodriguez’ formula. It should be clear that this formula outputs 
a polynomial of degree 1. The coefficient 1/2'1! comes from the traditional 
normalization for the Legendre polynomials that makes P,)(1) = 1. This 
convention does not lead to an orthonormal set. Instead, we have 


1 
2 


It is easy to show that this integral is zero if 1 > m—simply integrate by 
parts | times so as to take the | derivatives off (x? — 1)! and onto (2? — 1)”, 
which they kill. We will evaluate the / = m integral in the next section. 

We now show that the P;(a) given by Rodriguez formula are indeed so- 
lutions of Legendre’s equation: Let v = (x? — 1)', then 


(1 —2?)o’ + Qlrv = 0. (8.32) 
We differentiate this /+ 1 times using Leibniz’ theorem 


[ul™ = ae 
m 


m=0 
1 
= w™ + nuove) + giuln Sale ae. 68338) 
We find that 


[(Q — 2) 9 (1 — a)u") — (14 1)2a04) — 11 + 1), 
2anv]Y) = Qely+) + 210+ 1). (8.34) 


Putting these two terms together we obtain 


(a = ms =2n (i+ 1) Le ==} -=03 (8.35) 
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which is Legendre’s equation. 
The P;(x) have alternating parity 


P,(—x) = (-1)'P(z), (8.36) 
and the first few are 
P(x) = Li 
Pay =z, 
1 
Pp(e) = 5(30?-1), 
1 
P3(z) = 3 (52* — 32), 
1 
Pia) = g(352" — 302° + 3). 


8.2.2 Axisymmetric potential problems 


The essential property of the P;(x) is that the general axisymmetric solution 
of V2y~ = 0 can be expanded in terms of them as 


v(r,0) = > (Arr! + Br) Pi(cos8). (8.37) 


You should memorize this formula. You should also know by heart the ex- 
plicit expressions for the first four P;)(x), and the factor of 2/(2/ + 1) in the 
orthogonality formula. 

Example: Point charge. Put a unit charge at the point R, and find an ex- 
pansion for the potential as a Legendre polynomial series in a neighbourhood 
of the origin. 


|\R-r| 


r 


> 


O 


Figure 8.7: Geometry for generating function. 
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Let start by assuming that |r| < |R]. We know that in this region the point 
charge potential 1/|r — R| is a solution of Laplace’s equation , and so we can 
expand 


1 1 
Ir—R| Vr? + R2 — 2rReos8 


We knew that the coefficients B, were zero because ¢ is finite when r = 0. 
We can find the coefficients A; by setting @ = 0 and Taylor expanding 


area(s) ek em 


By comparing the two series and noting that P;(1) = 1, we find that A; = 
R!, Thus 


= 5° Air'Pi(cos 8). (8.38) 
l=0 


1 Lien th! 
ee , —)P 6 R. 8.40 
Vr2 + R?2 — 2rRcosé RO @) KORO), <P ( ) 


This last expression is the generating function formula for Legendre polyno- 
mials. It is also a useful formula to have in your long-term memory. 
If |r| > |R|, then we must take 


a = l = ~ —l-1 
hl gore ee OS 


because we know that y tends to zero when r = oo. We now set 6 = 0 and 
compare with 


1 Pen BNs 
Vas == S (=) P(cos@), R<r. (8.43) 
Vr2+ R2-—2rRcosO 1 ag. NE 


Observe that we made no use of the normalization integral 


i {P,(x)}? dx = 2/(21 +1) (8.44) 
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in deriving the the generating function expansion for the Legendre polyno- 
mials. The following exercise shows that this expansion, taken together with 
their previously established orthogonality property, can be used to establish 
8.44. 


Exercise 8.1: Use the generating function for Legendre polynomials P;(x) to 
show that 


3 2I [ mere [== dpa). ea 
— =-— VA ‘ 
i." ag a . _, 1-2az+ 2? * z \Itz)’ 


By Taylor expanding the logarithm, and comparing the coefficients of 2”, 
evaluate Pi{P@P da. 


Example: A planet is spinning on its axis and so its shape deviates slightly 
from a perfect sphere. The position of its surface is given by 


R(O,¢) = Ro + nPo(cos 6). (8.45) 


Observe that, to first order in 7, this deformation does not alter the volume 
of the body. Assuming that the planet has a uniform density po, compute 
the external gravitational potential of the planet. 


Figure 8.8: Deformed planet. 


The gravitational potential obeys Poisson’s equation 


V°o = 4nGop(x), (8.46) 
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where G is Newton’s gravitational constant. We expand ¢ as a power series 
in 7 

&(r, 9) = do(r, 9) + ndi(r, 0) +.... (8.47) 
We also decompose the gravitating mass into a uniform undeformed sphere, 
which gives the external potential 


4 G 
Po,ext(T, 8) = — (Frio) on mee > Ro, (8.48) 
and a thin spherical shell of areal mass-density 
o(0) = ponP2(cos @). (8.49) 
The thin shell gives rise to the potential 
brint(T, 0) = Ar*P2(cos0), r< Ro, (8.50) 


and ; 
rext(T, 0) = BP2(cos@), 1 > Ro. (8.51) 
r 


At the shell we must have $1 int = i,ext and 


OP 1 ext = O61 int 


Dr Dr 4rGo (6) (8.52) 


Thus A = BR,”, and 
4 
B= —=rGnpoRo. (8.53) 


Putting this together, we have 


P2(cos 6) 


2 
3 +O(n*), r> Ro. 


(8.54) 


4 1 ee: 
b(r,2) == (FrGro8) a (mGnpoRo) 
8.2.3. General spherical harmonics 


When we do not have axisymmetry, we need the full set of spherical harmon- 
ics. These involve solutions of 


d d m? 
—(1-—27?)— +l(1+ 1) -—, ] @= 
(fa-)2 ++) - 3) e=0, (8.5) 
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which is the associated Legendre equation. This looks like another com- 
plicated equation with singular endpoints, but its bounded solutions can 
be obtained by differentiating Legendre polynomials. On substituting y = 
(1 — x?)/?z(x) into (8.55), and comparing the resulting equation for z(z) 
with the m-th derivative of Legendre’s equation, we find that 


Pra) (1m — 2°)" Pa) (8.56) 
is a solution of (8.55) that remains finite (m = 0) or goes to zero (m > 0) 
at the endpoints x = +1. Since P;(x) is a polynomial of degree J, we must 
have P/"(z) = 0 if m > 1. For each I, the allowed values of m in this 
formula are therefore 0,1,...,/. Our definition (8.56) of the P/”(x) can be 
extended to negative integer m by interpreting d~!"!/dx~!™ as an instruction 
to integrate the Legendre polynomial m times, instead of differentiating it, 
but the resulting pyle (x) are proportional to P/"(x), so nothing new is 
gained by this conceit. 
The spherical harmonics are the normalized product of these associated 
Legendre functions with the corresponding e””?: 


Y/"(6, ) x P!"\(cos dye, -l<m<l. (8.57) 
The first few are 


1=0 Y= & (8.58) 
Yi = -4/¢sinbe’, 


(8.59) 
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ye = L/z sin? 0 ec, 

—\/Esino cos 6 e’?, 

he ae ae Va (3.cos?9 — 2), (8.60) 
a Jz sin 6 cos e~"?, 
Yer = i/e sin? Ger". 


The spherical harmonics compose an orthonormal 


=o 
t 


Pris Tw 
| dd J sin 0d0 [Y"(0, 6)|° Yi" (0, 6) = OwOmm'; (8.61) 
0 0 
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and complete 


SSS "Ge Yi"(G, @) = 5(¢ — ¢')6(cos 6’ — cos 8) (8.62) 


l=0 m=-—l 


set of functions on the unit sphere. In terms of them, the general solution to 
V7 = 0 is 


g(r,9,0) => > (Amr! + Bimr'?) Yi" (6, 6). (8.63) 


l=0 m=-—l 


This is definitely a formula to remember. 
The m = 0, the spherical harmonics are independent of the azimuthal 
angle ¢, and so must be proportional to the Legendre polynomials. The 


exact relation is 
2i+1 
¥(6,6) =4/ ™ P,(cos 8). (8.64) 


If we use a unit vector n to denote a point on the unit sphere, we have the 
symmetry properties 


[V"(n)* =(-1)"¥™(n),  -¥"(—n) = (-1)'Y/"(n). (8.65) 


These identities are useful when we wish to know how quantum mechanical 
wavefunctions transform under time reversal or parity. 
There is an addition theorem 


P(cosy) = sO NV. 9) (8.66) 


m=—l 


where 7 is the angle between the directions (0,¢) and (6’,¢’), and is found 
from 
cosy = cos@ cos &’ + sin @ sin &’ cos(¢ — ¢’). (8.67) 


The addition theorem is established by first showing that the right-hand side 
is rotationally invariant, and then setting the direction (0’, ¢’) to point along 
the z axis. Addition theorems of this sort are useful because they allow one 
to replace a simple function of an entangled variable by a sum of functions 
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of unentangled variables. For example, the point-charge potential can be 
disentangled as 


1 = 4 I 
rere Tot (+5) (¥/"(6", #'1°¥;"(0, 6) (8.68) 


where rz is the smaller of |r| or |r’|, and ry is the greater and (0, @), (0’, d’) 
specify the direction of r, r’ respectively. This expansion is derived by com- 
bining the generating function for the Legendre polynomials with the addition 
formula. It is useful for defining and evaluating multipole expansions. 


Exercise 8.2: Show that 


Ys (a + ty)’, 
Yo. (x ae iy)z, 
Yy og x? + y? — 22? 
Yy? Cea 


where x? + y? + z? = 1 are the usual Cartesian co-ordinates, restricted to the 
unit sphere. 


8.3 Bessel functions 


In cylindrical polar co-ordinates, Laplace’s equation is 
1a Oe 1p &y 


—V’? —=— + —. 8.69 
a a r Or "Or | 72802 Oa? ew) 
If we set y = R(r)e’”®e*** we find that R(r) obeys 
dR 1dR ge SAE 
oan aC = =| R=0. (8.70) 
Now P a ; 
y y 4 
ele rca es (ea Le 8.71 
tree ( =) ey 


is Bessel’s equation and its solutions are Bessel functions of order v. The 
solutions for R will therefore be Bessel functions of order m, but with x 
replaced by kr. 
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8.3.1 Cylindrical Bessel functions 


We now set about solving Bessel’s equation, 


1- “) yey 0. (8.72) 


This has a regular singular point at the origin, and an irregular singular point 
at infinity. We seek a series solution of the form 


y=x\(ltayr+ aor? +---), (8.73) 


and find from the indicial equation that A = +v. Setting A = v and in- 
serting the series into the equation, we find, with a conventional choice for 
normalization, that 


y= I,(2) = (5) y cn ag (8.74) 


Here (n+v)! = T(n+v+1). The functions J,(x) are called cylindrical Bessel 
functions. 

If v is an integer we find that J_,,(x) = (—1)"J,(), so we have only found 
one of the two independent solutions. It is therefore traditional to define the 
Neumann function 

Ji, —J_, 
NGG) = (x) OG iz) (8.75) 
sin vt 
as this remains an independent second solution even as v becomes integral. 
At short distance, and for v not an integer 


I(t) = ean 


= 
& 
| 
——~ 
| 
wee 
pe 
Se 
ae 


(8.76) 
When v tends to zero, we have 
Jo(x) — ee ree 


No(2) 


(=) (Ina/2+y)+---, (8.77) 
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where y = .57721... denotes the Euler-Mascheroni constant. For fixed v, 
and x« > v we have the asymptotic expansions 


Isa 1/2 coste - svn - an) (1 £6 (-)) (8.78) 
Nile) -~ |Z sine — sn — a) (1 +O (=)) (8.79) 
It is therefore natural to define the Hankel functions 
AH (a2) = J.(a) + iN (2) ~ yf Zeternreni, (8.80) 
H®) (x) = J,(x) —iN, (2) ~ jeter, (8.81) 


We will derive these asymptotic forms in chapter 19. 


Generating function 


The two-dimensional wave equation 


vy Te ®(r,6,t) =0 8.82 
> 2 OR (r, o3 — ( : ) 
has solutions 

Pees kr), (8.83) 


where k = |w|/c. Equivalently, the two dimensional Helmholtz equation 
(V2? +k?)6 =0, (8.84) 


has solutions e*”’ J, (kr). It also has solutions with J,,(kr) replaced by N,,(kr), 
but these are not finite at the origin. Since the e’"’J,(kr) are the only 
solutions that are finite at the origin, any other finite solution should be 
expandable in terms of them. In particular, we should be able to expand a 
plane wave solution: 


Seon = Ss" ace JeCkr (8.85) 
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As we will see in a moment, the a,’s are all unity, so in fact 


cikr sind = S- "9 J, (kr). (8.86) 


n=—Co 


This generating function is the historical origin of the Bessel functions. They 
were introduced by the astronomer Wilhelm Bessel as a method of expressing 
the eccentric anomaly of a planetary position as a Fourier sine series in the 
mean anomaly — a modern version of Hipparchus’ epicycles. 

From the generating function we see that 


|e cae eae 
In(t) = —— er rade: 8.87 
@=5 [ (8.87) 
Whenever you come across a formula like this, involving the Fourier integral 
of the exponential of a trigonometric function, you are probably dealing with 
a Bessel function. 

The generating function can also be written as 


ef?) = S~ aU, (2). (8.88) 
Expanding the left-hand side and using the binomial theorem, we find 
aay 1 (r+)! Paneer 
fa Ss) |x ris} alley fl 
m=0 r+s=m 
[o-e) [oe) r r+s tr-s 
~ oa G ris!’ 
r=0 s=0 tor 
- 3 4” . ae Gy" (8.89) 
7 a, << sl(s +n)! 2 


We recognize that the sum in the braces is the series expansion defining 
J,(x). This therefore proves the generating function formula. 


Bessel identities 


There are many identities and integrals involving Bessel functions. The stan- 
dard reference is the monumental Treatise on the Theory of Bessel Functions 
by G. N. Watson. Here are just a few formulee for your delectation: 
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i) Starting from the generating function 


ae {i (: - a = s Talax)t”, (8.90) 


nN=— CO 


we can, with a few lines of work, establish the recurrence relations 


2 a), = dea) = dda le), (8.91) 
2 
—Jn() So Be a Tea), (8.92) 
together with 
dia) = =); (8.93) 
In(aty) = So Sp(x)In—r(y). (8.94) 
ii) From the series expansion for J,(x) we find 
d 
re {ao I ke) = ae Sp al a) (8.95) 


iii) By similar methods, we find 


(=2) {a-"Jy(a)} = (-1)"2-"""Ingm(). (8.96) 


iv) Again from the series expansion, we find 


ad 1 
J lane "dt = ————. (8.97) 
i : \/ a? + p? 
Semi-classical picture 
The Schrodinger equation 
—_— =F 8.98 
Vp = By (8.98) 


can be separated in cylindrical polar co-ordinates, and has eigenfunctions 


wear, 0) = A(kr)e™®. (8.99) 
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Figure 8.9: Jio0(x). 


The eigenvalues are E = h?k?/2m. The quantity L = Al is the angular 
momentum of the Schrodinger particle about the origin. If we impose rigid- 
wall boundary conditions that ~,1(r,@) vanish on the circle r = R, then the 
allowed k form a discrete set kin, where Ji(kinR) = 0. To find the energy 
eigenvalues we therefore need to know the location of the zeros of Jj(z). 
There is no closed form equation for these numbers, but they are tabulated. 
The zeros for kR > I are also approximated by the zeros of the asymptotic 


expression 
ne 1 1 
J(kR) Oo akR cos(kR = ait = a” (8.100) 


which are located at 
Kini = ie + A + (2n+ ne (8.101) 
, 2 4 2 
If we let R — oo, then the spectrum becomes continuous and we are 
describing unconfined scattering states. Since the particles are free, their 
classical motion is in a straight line at constant velocity. A classical par- 
ticle making a closest approach at a distance rpin, has angular momentum 
L = prmin. Since p = hk is the particle’s linear momentum, we have ! = krypin. 
Because the classical particle is never closer than rin, the quantum me- 
chanical wavefunction representing such a particle will become evanescent 
(i.e. tend rapidly to zero) as soon as r is smaller than ryin. We therefore 
expect that Jj(kr) ~ 0 if kr < 1. This effect is dramatically illustrated by 
the Mathematica™ plot in figure 8.9. 


8.3. BESSEL FUNCTIONS 317 


Figure 8.10: The geometric origin of x(r) and @(r) in 8.102. 


Improved asymptotic expressions, which give a better estimate of the 
J(kr) zeros, are the approximations 


2 
Ji(kr) aa cos(kx —10—7/4),  r>>Tmin, 
2 
Ni(kr) = a sin(ka —l0—7/4), r > Tpin- (8.102) 


Here 6 = cos"\(rmin/r) and x = rsin@ are functions of r. They have a 
geometric origin in the right-angled triangle in figure 8.10. The parameter 
x has the physical interpretation of being the distance, measured from from 
the point of closest approach to the origin, along the straight-line classical 
trajectory. The approximation is quite accurate once r exceeds rin by more 
than a few percent. 


The asymptotic r~!/? fall-off of the Bessel function is also understandable 
in the semiclassical picture. 
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Figure 8.11: A collection of trajectories, each missing the origin by Tryin, 
leaves a “hole”. 
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Figure 8.12: The hole is visible in the real part of wy,90(r0) = e'?° Joo(kr) 
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By the uncertainly principle, a particle with definite angular momentum must 
have completely uncertain angular position. The wavefunction J)(kr)e”? 
therefore represents a coherent superposition of beams of particles approach- 
ing from all directions, but all missing the origin by the same distance. The 
density of classical particle trajectories is infinite at r = rin, forming a caus- 
tic. By “conservation of lines”, the particle density falls off as 1/r as we move 
outwards. The particle density is proportional to |7)|?, so w itself decreases 
as r—'/2, In contrast to the classical particle density, the quantum mechan- 
ical wavefunction amplitude remains finite at the caustic — the “geometric 
optics” infinity being tempered by diffraction effects. 


Exercise 8.3: The WKB (Wentzel-Kramers-Brillouin) approximation to a so- 
lution of the Schrédinger equation 


sets 


be) — exp {i | “nO ac | 


where K(x) = \/E — V(a), and a is some conveniently chosen constant. This 
form of the approximation is valid in classically allowed regions, where « is 
real, and away from “turning points” where « goes to zero. In a classically 
forbidden region, where « is imaginary, the solutions should decay exponen- 
tially. The connection rule that matches the standing wave in the classically 
allowed region onto the decaying solution is 

a 

[oweae|- 4}. 
a 


eet [seas|} s | 


where a is the classical turning point. (The connection is safely made only 
in the direction of the arrow. This because a small error in the phase of the 
cosine will introduce a small admixture of the growing solution, which will 
eventually swamp the decaying solution.) 


Show that setting y(r) = r—!/2~)(r) in Bessel’s equation 


3 _t 2 
dr? ordr— r? . 
reduces it to Schrodinger form 
a? 2-1/4 
» (P=1/4) yay 


dr? r2 
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From this show that a WKB approximation to y(r) is 


r 2 h2 
“ik | ve ol. 
b p 


2 


1 
y(r) (2 — Bye exp f r> b 


es exp{+ilkz(r) — 10(r)]}, 


Va(r) 
where kb = ,/l? — 1/4 = 1, and 2(r) and 6(r) were defined in connection with 
(8.102). Deduce that the expressions (8.102) are WKB approximations and 


are therefore accurate once we are away from the classical turning point at 
r=b=Pnin 


8.3.2 Orthogonality and completeness 
We can write the equation obeyed by J, (kr) in Sturm-Liouville form. We 


have = Fi , 
eceiy preci! parece, rae 
ae () + (i 3 ) y = 0. (8.103) 


Comparison with the standard Sturm-Liouville equation shows that the weight 
function, w(r), is r, and the eigenvalues are k?. 
From Lagrange’s identity we obtain 


(k?—k?) il : Jm(kir) Im(kar)r dr = R[koJm(kiR)J),(keR) — kiJm(koR)J.,(k1R)) . 


(8.104) 
We have no contribution from the origin on the right-hand side because all 
Jm Bessel functions except Jo vanish there, whilst Jj(0) = 0. For each m we 
get get a set of orthogonal functions, Jn(k,x), provided the k,R are chosen 
to bé:roots of J.(k,R) —0 of JL (kh, R)= 0. 
We can find the normalization constants by differentiating (8.104) with 
respect to k, and then setting k,; = k in the result. We find 


2 


[nom = telicont (=) lan] 


= sh [[Un(KR)? — Jn1(kR)Jn4i(kR)]. (8.105) 


(The second equality follows on applying the recurrence relations for the 
J,(kr), and provides an expression that is perhaps easier to remember.) For 
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Dirichlet boundary conditions we will require k,,.R to be zero of J, and so 
we have 

R 2 1 2 
| [J (er) rdr = =F? [Jin (eR | (8.106) 
0 

For Neumann boundary conditions we require k,R to be a zero of J/,. In 


this case 


2 


[ n(n) ‘rdr = si (1 7 am) [Jn (eR) | : (8.107) 


Figure 8.13: Cylinder geometry. 


Example: Harmonic function in cylinder. We wish to solve V?V = 0 within 
a cylinder of height L and radius a. The voltage is prescribed on the upper 
surface of the cylinder: V(r, 6, L) = U(r,0). We are told that V = 0 on all 
other parts of boundary. 

The general solution of Laplace’s equation in will be sum of terms such 


sinh(kz) . Jahr) 7 sin(m@) . (8.108) 
cosh(kz) Nm (kr) cos(m@) 
where the braces indicate a choice of upper or lower functions. We must take 
only the sinh(kz) terms because we know that V = 0 at z = 0, and only the 
Jm(kr) terms because V is finite at r = 0. The k’s are also restricted by the 


boundary condition on the sides of the cylinder to be such that J,,(ka) = 0. 
We therefore expand the prescribed voltage as 


as 


U(r,@) = ¥_ sinh(knmL) Jm(Kmn?) [Anm sin(m#) + Bnm cos(mé)], (8.109) 
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and use the orthonormality of the trigonometric and Bessel function to find 
the coefficients to be 


Ant = ecorees (nm) af i fou ice 0) J, sl pes (Knit) sin(mé) rdr, (8.110) 


na2[ J", (Knma)|2 DE Mami) 


2 h(kamL) 
Bee = Pose Em . = ao fu U(r, 0) Jm(Knmr) cos(mé) rdr, m#0, 
OAS Rent) 
(8.111) 


and 


1 2cosech(KnoL) 
Bee D ra2[ Fel Iena)]? 3 ao [ul U(r, 8) Jo(Knor) rdr. (8.112) 


Then we fit the boundary data expansion to the general solution, and so find 


V(r, 6, 2) a> sinh(KknmZ)Jm(Kmn’) [Anm sin(m8) + Bnm cos(mé)]. (8.113) 


Hankel transforms 


When the radius, R, of the region in which we performing our eigenfunction 
expansion becomes infinite, the eigenvalue spectrum will become continuous, 
and the sum over the discrete k,, Bessel-function zeros must be replaced by 
an integral over k. By using the asymptotic approximation 


2 1 1 
In(kR) ~ 4/ ER cos(kR — nt — a” (8.114) 


we may estimate the normalization integral as 


‘ k; : d f O(1 11 
ff [aatin)]rar~ 5 + (1). (8.115) 
We also find that the asymptotic density of Bessel zeros is 
dn R 
—_—=-—. 8.116 
dk ( ) 


Putting these two results together shows that the continuous-spectrum or- 
thogonality and completeness relations are 


i odes Teg = iG ak (8.117) 
- FGI Bn RS ~a(r ey, (8.118) 
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respectively. These two equations establish that the Hankel transform (also 
called the Fourier-Bessel transform) of a function f(r), which is defined by 


Fk). = ie: JInlkr sy (rrdr, (8.119) 
has as its inverse 4s 
ins | Jn(kr) FE (k)k dk. (8.120) 


(See exercise 8.14 for an alternative derivation of the Hankel-transform pair.) 
Some Hankel transform pairs: 


0 1 
-0" Jy(kr)dr = —=——, 
/ € o(kr) dr ae 
TON Gifs 1, (8.121) 


0 Vk2+a? r 


= 0, hae 

/ cos(ar) Jo(kr) dr = { 1/Jk2 — a, a 
Jo(kr) 

a V k2 =; a? 


| sin(ar)Jo(kr) dr = { I/via* —k*, k <a, 
0 0, k > a. 


kdk = re (8.122) 
r 


Jo(kr) 
0 Vv a = k2 
Example: Weber’s disc problem. Consider a thin isolated conducting disc of 
radius a lying on the x-y plane in R®. The disc is held at potential Vy. We 
seek the potential V in the entirety of R®, such that V — 0 at infinity. 

It is easiest to first find V in the half-space z > 0, and then extend the 
solution to z < 0 by symmetry. Because the problem is axisymmetric, we 
will make use of cylindrical polar co-ordinates with their origin at the centre 
of the disc. In the region z > 0 the potential V(r, z) obeys 


V’V(r,z) = 0, z>0, 
V(r,z) — 0 |z|—-0o 
V0) = Vy PG 
OV 
82 | 20 


kdk = Saantany, (8.123) 
if 


SF 0 eG, (8.124) 
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This is a mixed boundary value problem. We have imposed Dirichlet boundary 
conditions on r < a and Neumann boundary conditions for r > a. 

We expand the axisymmetric solution of Laplace’s equation terms of 
Bessel functions as 


VES | A(k)e~*l Ja( br) dk, (8.125) 
0 
and so require the unknown coeffcient function A(k) to obey 


| MTD OR Se Ves pees 
0 


i kA(k)Jo(kr)dk = 0, r>a. (8.126) 


No elementary algorithm for solving such a pair of dual integral equations 
exists. In this case, however, some inspired guesswork helps. By integrating 
the first equation of the transform pair (8.122) with respect to a, we discover 


that (ar) / , 
°° sin(ar n/2, <a, 
i r dole Ps { sin-'(a/k), k>a. e120) 


With this result in hand, we then observe that (8.123) tells us that the 
function 


OV, si 
Aas ome) (8.128) 
Tk 
satisfies both equations. Thus 
2Vvo [* dk 
Vic2) = = | e~*ll sin(ka) Jo(kr) — (8.129) 
wT Jo k 
The potential on the plane z = 0 can be evaluated explicitly to be 
io; rod, 
VAG { (2Vo/r)sin“'(a/r). r>a. el) 
The charge distribution on the disc can also be found as 
a6) OV _ ow 
Oa. f  OZ 205 
AVo [% 
ai } sin(ak) Jo dk 
T Jo 
AV 
St ee Se (8.131) 


>) 
arVa2 — r2 
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8.3.3. Modified Bessel functions 


When k is real the Bessel function J,(kr) and the Neumann N,,(kr) function 

oscillate at large distance. When & is purely imaginary, it is convenient 

to combine them so as to have functions that grow or decay exponentially. 

These combinations are the modified Bessel functions I,(kr) and K,,(kr). 
These functions are initially defined for non-integer v by 


Eg Se Ie (8.132) 
Ke) = 5——[L(2) - L@)} (8.133) 


The factor of i~” in the definition of J, (x) is inserted to make J, real. Our 
definition of K,(x) is that in Abramowitz and Stegun’s Handbook of Mathe- 
matical Functions. It differs from that of Whittaker and Watson, who divide 
by tanv7 instead of sinv7. 

At short distance, and for v > 0, 


Hay = 6) meant (8.134) 


a 
= 
T 
| 
= 
& 
| 
Sa 

t 


(8.135) 


When v becomes and integer we must take limits, and in particular 


I(x) = 1+ a fee, (8.136) 
Ko(z) = —(Inz/2+y)+---. (8.137) 


The large x asymptotic behaviour is 


1 
Ea we e, LM, 8.138 
@) ~ ae (8.138) 
Kyle). aga L— OO. (8.139) 


V2 


From the expression for J;,(z) as an integral, we have 


1 20 ‘ 1 wT 
LA) i eee ere = -{ cos(nd)e” °° "dé (8.140) 
0 0 


7 
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for integer n. When n is not an integer we still have an expression for J,(z) 
as an integral, but now it is 


Le) = a cos(v0)e" °° dé — area | eee (8.141) 
B® Jo ci 0 

Here we need |argz| < 2/2 for the second integral to converge. The origin 

of the “extra” infinite integral must remain a mystery until we learn how 

to use complex integral methods for solving differential equations. From the 

definition of K,(x) in terms of I, we find 


K,(2) =. e *ht cosh(vt) dt, jarga| < 1/2. (8.142) 
0 


Physics Illustration: Light propagation in optical fibres. Consider the propa- 
gation of light of frequency wp down a straight section of optical fibre. Typical 
fibres are made of two materials. An outer layer, or cladding, with refractive 
index nz, and an inner core with refractive index n, > n». The core of a fibre 
used for communication is usually less than 104m in diameter. 

We will treat the light field F as a scalar. (This is not a particularly good 
approximation for real fibres, but the complications due the vector character 
of the electromagnetic field are considerable.) We suppose that E’ obeys 


2 2 2 2 2 
OE SOE Ores een (8.143) 


Ox? a Oy? Oz? Ce Or 


Here n(x, y) is the refractive index of of the fibre, which is assumed to lie 
along the z axis. We set 


ACY 2.) =ve,y ze (8.144) 


where kp = wo/c. The amplitude w is a (relatively) slowly varying envelope 
function. Plugging into the wave equation we find that 


Oy Fy Ob a. OY (W(a,y) 
Hae Oy? O22 dike + ( 2 wii) v=o. (8.145) 


Because wy is slowly varying, we neglect the second derivative of w with 
respect to z, and this becomes 


2 2 
Dik Se =- (= + sz) wt+k5 (1—n7(2,y)) dv, (8.146) 
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which is the two-dimensional time-dependent Schrodinger equation, but with 
t replaced by z/2ko, where z is the distance down the fibre. The wave-modes 
that will be trapped and guided by the fibre will be those corresponding to 
bound states of the axisymmetric potential 


V(x, y) = ke(1 —n?(r)). (8.147) 


If these bound states have (negative) “energy” E,, then w x e~'»2/?k0, and 
so the actual wavenumber for frequency wo is 


In order to have a unique propagation velocity for signals on the fibre, it 
is therefore necessary that the potential support one, and only one, bound 
state. 
If 
ar) = aig, ae 
Ns: SG, (8.149) 


then the bound state solutions will be of the form 


eer I (Kr), r<a, 


w(r, ) = Veron r>a, (8.150) 

where 
m= (nik? — 6), (8.151) 
~ = (8? —ndke), (8.152) 


To ensure that we have a solution decaying away from the core, we need 3 
to be such that both « and ¥ are real. We therefore require 


(7 
eS SS a. (8.153) 


At the interface both w and its radial derivative must be continuous, and so 
we will have a solution only if 3 is such that 


Tn(ka) _ | Kn (va) 


‘ Tithe. "K,, (ya) 


This Shrodinger approximation to the wave equation has other applica- 
tions. It is called the paraxial approximation. 
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8.3.4 Spherical Bessel functions 


Consider the wave equation 


ee o (r,0,¢,t) =0 (8.154) 
C2 Ot2 Y pe a ae) — ° 
in spherical polar coordinates. To apply separation of variables, we set 
gp =e™Y"(0,9)x(r), (8.155) 
and find that PL ah +1) ‘5 
x x a Ww 
—+4+-—- =x = 0. 8.156 
dr? or dr a 7 a” ( ) 
Substitute x =r7'/?R(r) and we have 
a e i447 
RR 1dk € (i+ 5) ) R=0. 


Gage 


(8.157) 


Cc r2 


This is Bessel’s equation with v? — (1 + $)*. Therefore the general solution 
is 


R= Ad, (kr) + BJ_,_1 (kr), (8.158) 
where k} = |w|/c. Now inspection of the series definition of the J, reveals 
that 
sin x, (8.159) 


J 


(2 — cos x, (8.160) 


i 
2 


so these Bessel functions are actually elementary functions. This is true of 
all Bessel functions of half-integer order, vy = +1/2, +3/2,.... We define the 
spherical Bessel functions by? 


ie) =f Eagle, (8.161) 
2 


WE = SO Ear (8.162) 


?We are using the definitions from Schiff’s Quantum Mechanics. 


Nile 
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The first few are 


—sin gz, 
ae 


1 
— sinz — —cosz, 
c < 


3 1 ; 3 
— —-— ]sinzx — —cosz, 
nF x? 


1 
——coszZ, 
e 


1 
—— cos x — — sing, 
x x 


3 1 3); 
= —(———)]cosx— —sinz. 
wv 2 xe? 
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Despite the appearance of negative powers of x, the j)(x) are all finite at 


x =0. The n;(x) all diverge to —oo as x — 0. In general 


jx 
n(x 


fi(x) sin x + gi(x) cos(x), 
— fi(x) cos(x) + gi(x) sin x, 


where f(a) and g(x) are polynomials in 1/z. 


We also define the spherical Hankel functions by 


These behave like 


at large x. 


Ten 
A (x) ws Leile—[l+l)r/2) 


nM (2) = jr(w) +in(a), 
(oy Sale). sile). 


£ ’ 
1 


x 


(8.163) 
(8.164) 


(8.165) 
(8.166) 


(8.167) 


(8.168) 


The solution to the wave equation regular at the origin is therefore a sum 


of terms such as 


Peim(r,9,0,t) = julkr)¥i"(8, oe", 


(8.169) 
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where w = +ck, with k > 0. For example, the plane wave e*** has expansion 
ike — etkroos? _ S “(21 + 1)i'ji(kr) P,(cos 9), (8.170) 


1=0 


or equivalently, using (8.66), 


el = an > Dake) [vr] V"@) (8.171) 


i 
oO 
3 
| 
as 


where k, #, unit vectors in the direction of k and r respectively, are used as 
a shorthand notation to indicate the angles that should be inserted into the 
spherical harmonics. This angular-momentum-adapted expansion of a plane 
wave provides a useful tool in scattering theory. 


Exercise 8.4: Peierl’s Problem. Critical Mass. ‘The core of a nuclear device 
consists of a sphere of fissile ?°°U of radius R. It is surrounded by a thick shell 
of non-fissile material which acts as a neutron reflector, or tamper. 


Figure 8.14: Fission core. 


In the core, the fast neutron density n(r,t) obeys 


on =yn+ DrV*n. (8.172) 


Here the term with v accounts for the production of additional neutrons due to 
induced fission. The term with Dp describes the diffusion of the fast neutrons. 
In the tamper the neutron flux obeys 

On 


— = DrV?n. 8.173 
at oe ( ) 
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Both the neutron density n and flux j = DrrVn, are continuous across the 
interface between the two materials. Find an equation determining the critical 
radius R, above which the neutron density grows exponentially. Show that the 
critical radius for an assembly with a tamper consisting of ??°U (Dr = Dp) is 
one-half of that for a core surrounded only by air (Dr = co), and so the use 
of a thick 7°°U tamper reduces the critical mass by a factor of eight. 


Factorization and recurrence 


The equation obeyed by the spherical Bessel function is 


d’x, 2dx, , Ul+1) 2 
— — l A 
dx? x dx bE oe Xt ete 
or, in Sturm-Liouville form, 
ies Ve SG ee 
x? dx (: 3) ee ee Xt ene) 


The corresponding differential operator is formally self-adjoint with respect 
to the inner product 


(i:9) = / "(ft glade. (8.176) 


Now, the operator 
da 2d I(l+1) 


D, =-=s - -— : 
di: “edz a ig cot ey) 
factorizes as ad a 
—1 +1 
D, = | -— + — J] | — + — 8.178 
( ae ) (= oes ) ( ) 
or as a 4 Be ccs 
+2 
D, = | —+—]|-—+-}. 8.179 
(= He ) ( dx : | ( ) 
Since, with respect to the w = x? inner product, we have 
a™i tid» d 2 
Sey ee est ee eS a 1 
(=) x dx das" ey 
we can write 
D, = ALAr = Ans Al, (8.181) 
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where ‘ 
1 
A = (= + =") , (8.182) 
From this we can deduce 
Au ax JA; (8.183) 
Al ijt O Sept (8.184) 


The constants of proportionality are in each case unity. The same recurrence 
formulze hold for the spherical Neumann functions n;. 


8.4 Singular endpoints 


In this section we will exploit our knowledge of the Laplace eigenfunctions 
in spherical and plane polar coordinates to illustrate Weyl’s theory of self- 
adjoint boundary conditions at singular endpoints. We also connect Weyl’s 
theory with concepts from scattering theory. 


8.4.1 Weyl’s theorem 


Consider the Sturm-Liouville eigenvalue problem 


Ly = ——[plr)y'} + a(r)y = (8.185) 


on the interval [0, R]. Here p(r) g(r) and w(r) are all supposed real, so the 
equation is formally self-adjoint with respect to the inner product 


R 
(u),= f wu'v dr. (8.186) 
0 


If r = 0 is a singular point of (8.185), then we will be unable to impose 
boundary conditions of our accustomed form 


ay(0) + by'(0) = 0 (8.187) 


because one or both of the linearly independent solutions y;(r) and y2(r) will 
diverge as r — 0. The range of _ possibilities was ennumerated by Weyl: 
Theorem (Hermann Weyl, 1910): Suppose that r = 0 isa singular point and 
r = Ra regular point of the differential equation (8.185). Then 
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I. Either: 

a) Limit-circle case: There exists a \o such that both of the linearly 
independent solutions to (8.185) have aw norm that is convergent 
in the vicinity of r = 0. In this case both solutions have convergent 
w norm for all values of X. 

Or 

b) limit-point case : No more than one solution has convergent w 
norm for any X. 

II. In either case, whenever Im ¥ 0, there is at least one finite-norm 
solution. When X lies on the real axis there may or may not exist a 
finite norm solution. 


We will not attempt to prove Weyl’s theorem. The proof is not difficult and 
may be found in many standard texts.® It is just a little more technical than 
the level of this book. We will instead illustrate it with enough examples to 
make the result plausible, and its practical consequences clear. 

When we come to construct the resolvent R(r,r’) obeying 


(L — AI)Ry(r, 7’) = 6(r — 1’) (8.188) 


by writing it is a product of ye and ys we are obliged to choose a normalizable 
function for ye, the solution obeying the boundary condition at r = 0. We 
must do this so that the range of R, will be in L?[0, R]. In the limit-point 
case, and when Im \ ¥ 0, there is only one choice for yz. There is therefore 
a unique resolvent, a unique self-adjoint operator L — AI of which R) is the 
inverse, and hence L is a uniquely specified differential operator.* 

In the limit-circle case there is more than one choice for y- and hence more 
than one way of making L into a self-adjoint operator. To what boundary 
conditions do these choices correspond? 

Suppose that the two normalizable solutions for A = Xo are yi(r) and 
y2(r). The essence of Weyl’s theorem is that once we are sufficiently close 
to r = O the exact value of \ is unimportant and all solutions behave as 
a linear combination of these two. We can therefore impose as a boundary 
condition that the allowed solutions be proportional to a specified real linear 


3For example: Ivar Stackgold Boundary Value Problems of Mathematical Physics, Vol- 
ume I (SIAM 2000). 

4When ) is on the real axis then there may be no normalizable solution, and Ry cannot 
exist. This will occur only when \ is in the continuous spectrum of the operator L, and is 
not a problem as the same operator L is obtained for any 1. 
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combination 
y(r) x ayi(r) + by2(r), 0. (8.189) 


This is a natural generalization of the regular case where we have a solution 
yi(r) with boundary conditions y,(0) = 1, y{(0) = 0, so y:(r) ~ 1, anda 
solution yo(r) with yo(0) = 0, y4(0) = 1, so yo(r) ~ r. The regular self-adjoint 
boundary condition 


ay(0) + by’(0) = 0 (8.190) 
with real a,b then forces y(r) to be 
y(r) « byi(r) — ayo(r) ~bl-—ar, r—0. (8.191) 
Example: Consider the radial part of the Laplace eigenvalue problem in two 
dimensions. ae eae me 
Ly = eae (=) + av = ky. (8.192) 


The differential operator L is formally self-adjoint with respect to the inner 
product 


R 
hx) = | wy rdr. (8.193) 


When k? = 0, the m? 4 0 equation has solutions 7 = r*™, and, of the 
normalization integrals 


R R 
| Ir'™ |? rdr, i. |r—™ |? rdr, (8.194) 
0 0 


only the first, containing the positive power of r, is convergent. For m 4 0 
we are therefore in Weyl’s limit-point case. For m? = 0, however, the k? = 0 
solutions are w(r) = 1 and w9(r) = Inr. Both normalization integrals 


R R 
i 1? rdr, | | In r|? rdr (8.195) 
0 0 


converge and we are in the limit-circle case at r = 0. When k? > 0 these 
solutions become 


Jo(kr) = 1- r(kr)? free, 


No(kr) 


(=) [In(kr/2) +9] +--°. (8.196) 


8.4. SINGULAR ENDPOINTS 335 


Both remain normalizable, in conformity with Weyl’s theorem. The self- 
adjoint boundary conditions at r — 0 are therefore that near r = 0 the 
allowed functions become proportional to 


l+alnr (8.197) 


with a some specified real constant. 
Example: Consider the radial equation that arises when we separate the 
Laplace eigenvalue problem in spherical polar coordinates. 


=e (fro) ED) 


ae a 5 w= ky. (8.198) 


When k = 0 this has solutions y = r', r~'~!. For non-zero | only the first of 
the normalization integrals 


R R 
i: pp dp, i! pot-2 dp (8.199) 
0 0 


is finite. Thus, for for / 4 0, we are again in the limit-point case, and the 
boundary condition at the origin is uniquely determined by the requirement 
that the solution be normalizable. 

When | = 0, however, the two k? = 0 solutions are w(r) = 1 and 
Wo(r) = 1/r. Both integrals 


R R 
y, r? dr, ip r?r? dr (8.200) 
0 0 


converge, so we are again in the limit-circle case. For positive k?, these 
solutions evolve into 


sin kr cos kr 


Wre(r) = jo(kr) = Le Won(r) = —kno(kr) = 


(8.201) 


Near r = 0, we have ~~ 1 and w2% ~ 1/r, exactly the same behaviour as 
the k? = 0 solutions. 
We obtain a self-adjoint operator if we choose a constant a, and demand 
that all functions in the domain be proportional to 
As 


Ural — (8.202) 


r 
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as we approach r = 0. If we write the solution with this boundary condition 
as 


Ge sin(kr +7) _ er oS Stee aa 
r r r 
~ kcosn (1 + wt) ; (8.203) 


we can read off the phase shift 7 as 
tan n(k) = —kas. (8.204) 


These boundary conditions arise in quantum mechanics when we study 
the scattering of particles whose de Broglie wavelength is much larger than 
the range of the scattering potential. The incident wave is unable to resolve 
any of the internal structure of the potential and perceives its effect only as a 
singular boundary condition at the origin. In this context the constant a, is 
called the scattering length. This physical model explains why only the / = 0 
partial waves have a choice of boundary condition: classical particles with 
angular momentum / 4 0 would miss the origin by a distance ryin = //k and 
never see the potential. 

The quantum picture also helps explain the physical origin of the dis- 
tinction between the limit-point and limit-circle cases. A point potential can 
have a bound state that extends far beyond the short range of the potential. 
If the corresponding eigenfunction is normalizable, the bound particle has 
a significant amplitude to be found at non-zero r, and this amplitude must 
be included in the completeness relation and in the eigenfunction expansion 
of the Green function. When the state is not normalizable, however, the 
particle spends all its time very close to the potential, and its eigenfunc- 
tion makes zero contribution to the Green function and completness sum at 
any non-zero r. Any admixture of this non-normalizable state allowed by 
the boundary conditions can therefore be ignored, and, as far as the exter- 
nal world is concerned, all boundary conditions look alike. The next few 
exercises will illustrate this. 


Exercise 8.5: The two-dimensional “delta-function” potential. Consider the 
quantum mechanical problem in R? 


(-V? + V(|r|)) ¥ = BY 
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with V an attractive circular square well. 


V(r) = { —/na?, r<a 
0, r>a. 


The a of ra” has been inserted to make this a regulated version of V(r) = 
—?(r). Let wp = /d/ra2. 


i) s matching the functions 


Ko(ar), ra, 


at r = a, show that as a becomes small, we can scale towards zero 
in such a way that the well becomes infinitely deep yet there remains a 
single bound state with finite binding energy 


4 oy _ 
Eo = Kk = “ae Ng leas 
a 


It is only after scaling \ in this way that we have a well-defined quantum 
mechanical problem with a “point” potential. 

ii) Show that in the scaling limit, the associated wavefunction obeys the 
singular-endpoint boundary condition 


wr) > 1l+alnr, r-0 


where 
1 


ar y+ink/2° 


Observe that by varying K? between 0 and oo we can make a be any 
real number. So the entire range of possible self-adjoint boundary condi- 
tions may be obtained by specifying the binding energy of an attractive 
potential. 

iii) Assume that we have fixed the boundary conditions by specifying «, and 
consider the scattering of unbound particles off the short-range potential. 
It is natural to define the phase shift 7(k) so that 


Ur(r) = cosnJo(kr) — sinnNo(kr) 


2 
~ \ ap, coslkr — 1/4 +1), r— Oo. 
2 
cot 7 = (=) Ink/k. 
7 


Show that 
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Exercise 8.6: The three-dimensional “delta-function” potential. Repeat the 
calculation of the previous exercise for the case of a three-dimensional delta- 
function potential 


Cog 


i) Show that as we take a — 0, the delta-function strength \ can be ad- 
justed so that the scattering length becomes 


a eee eee 
oe 4na? a 


ii) Show that when this a, is positive, the attractive potential supports a 
single bound state with external wavefunction 


wr) x 1 ver 
r 


and remains finite. 


where k = a;,!. 
Exercise 8.7: The pseudo-potential. Consider a particle of mass yz confined in 
a large sphere of radius R. At the center of the sphere is a singular potential 
whose effects can be parameterized by its scattering length a, and the resultant 
phase shift 
n(k) = tann(k) = —ask. 


In the absence of the potential, the normalized | = 0 wavefunctions would be 
1 sinkyr 
Vl) =VoR + 


i) Show that the presence of the singular potential perturbs the 7, eigen- 
state so that its energy E,, changes by an amount 


where k,, = n7/R. 


= i? Dash? 


AE, = 
2u sR 


ii) Show this energy shift can be written as if it were the result of applying 
first-order perturbation theory 


AEn © (n|Vpsln) = / dr |tbal2Vpa(r) 
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to an artificial pseudo-potential 


2 
Vpu(r) = 22s" 98. 
m 

Although the energy shift is small when R is large, it is not a first-order per- 
turbation effect and the pseudo-potential is a convenient fiction which serves 
to parameterize the effect of the true potential. Even the sign of the pseudo- 
potential may differ from that of the actual short distance potential. For our 
attractive “delta function”, for example, the pseudopotential changes from be- 
ing attractive to being repulsive as the bound state is peeled off the bottom of 
the unbound continuum. The change of sign occurs not by a, passing through 
zero, but by it passing through infinity. It is difficult to manipulate a single po- 
tential so as to see this dramatic effect, but when the particles have spin, and 
a spin-dependent interaction potential, it is possible to use a magnetic field to 
arrange for a bound state of one spin configuration to pass through the zero 
of energy of the other. The resulting Feshbach resonance has the same effect 
on the scattering length as the conceptually simpler shape resonance obtained 
by tuning the single potential. 


The pseudo-potential formula is commonly used to describe the pairwise 
interaction of a dilute gas of particles of mass m, where it reads 


_ 4nra,h? 


Vps(r) = O(r). (8.205) 


m 
The internal energy-density of the gas due to the two-body interaction then 
becomes 

14ra,h? 4 
where p is the particle-number density. 

The factor of two difference between the formula in the exercise and 
(8.205) arises because the ju in the exercise must be understood as the reduced 
mass jt = m?/(m+m) = m/2 of the pair of interacting particles. 

Example: In n dimensions, the “J = 0” part of the Laplace operator is 


#@ me 
dr2 r dr 


This formally self adjoint with respect to the natural inner product 


xn = ‘ rt wy dr. (8.206) 
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The zero eigenvalue solutions are W(r) = 1 and W2(r) = r?-". The second 
of these ceases to be normalizable once n > 4. In four space dimensions and 
above, therefore, we are always in the limit-point case. No point interaction 
—no matter how strong — can affect the physics. This non-interaction result 
extends, with slight modification, to the quantum field theory of relativistic 
particles. Here we find that contact interactions become irrelevent or non- 
renormalizable in more than four space-time dimensions. 


8.5 Further exercises and problems 


Here some further problems involving Legendre polynomials, associated Leg- 
endre functions and Bessel functions: 


Exercise 8.8: A sphere of radius a is made by joining two conducting hemi- 
spheres along their equators. The hemispheres are electrically insulated from 
one another and maintained at two different potentials Vj and Vo. 


a) Starting from the general expression 
V(7,0) = do (u + | P,(cos 6) 

find an integral expression for the coefficients a), b; that are relevent to 
the electric field outside the sphere. Evaluate the integrals giving 61, bo 
and 63. 

b) Use your results from part a) to compute the electric dipole moment of 
the sphere as function of the potential difference V; — V2. 

c) Now the two hemispheres are electrically connected and the entire surface 
is at one potential. The sphere is immersed in a uniform electric field E. 
What is its dipole moment now? 


Problem 8.9: Tides and Gravity . The Earth is not exactly spherical. Two 
major causes of the deviation from sphericity are the Earth’s rotation and the 
tidal forces it feels from the Sun and the Moon. In this problem we will study 
the effects of rotation and tides on a self-gravitating sphere of fluid of uniform 
density po. 


a) Consider the equilibrium of a nearly spherical body of fluid rotating 
homogeneously with angular velocity wo. Show that the effect of rotation 
can be accounted for by introducing an “effective gravitational potential” 


1 
Peff = Perav + woh" (Ps(cos 0) — 1), 
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where R, @ are spherical coordinates defined with their origin in the 
centre of the body and Zz along the axis of rotation. 

b) A small planet is in a circular orbit about a distant massive star. It 
rotates about an axis perpendicular to the plane of the orbit so that it 
always keeps the same face directed towards the star. Show that the 
planet experiences an effective external potential 


Ytidal = —2?R? Po(cos 8), 


together with a potential, of the same sort as in part a), that arises from 
the once-per-orbit rotation. Here 2 is the orbital angular velocity, and 
R, 0 are spherical coordinates defined with their origin at the centre of 
the planet and 2 pointing at the star. 

c) Each of the external potentials slightly deforms the initially spherical 
planet so that the surface is given by 


R(O,¢) = Ro + nP2(cos 8). 


(With 6 being measured with respect to different axes for the rotation 
and tidal effects.) Show that, to first order in 7, this deformation does 
not alter the volume of the body. Observe that positive 7 corresponds 
to a prolate spheroid and negative 7 to an oblate one. 

d) The gravitational field of the deformed spheroid can be found by ap- 
proximating it as an undeformed homogeneous sphere of radius Ro, to- 
gether with a thin spherical shell of radius Ro and surface mass density 
o = ponP2(cos@). Use the general axisymmetric solution 


(oe) 


B 
p(R, 8, ¢) = Ss" (4.8! + a P,(cos 6) 
1=0 


of Laplace’s equation, together with Poisson’s equation 
V7p = 4nGo(r) 


for the gravitational potential, to obtain expressions for Yghey in the 
regions R > Ro and R < Ro. 

e) The surface of the fluid will be an equipotential of the combined poten- 
tials of the homogeneous sphere, the thin shell, and the effective external 
potential of the tidal or centrifugal forces. Use this fact to find n (to 
lowest order in the angular velocities) for the two cases. Do not include 
the centrifugal potential from part b) when computing the tidal distor- 
tion. We never include the variation of the centrifugal potential across a 
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planet when calculating tidal effects. This is because this variation is due 
to the once-per-year rotation, and contributes to the oblate equatorial 


2 
bulge and not to the prolate tidal bulge.° (Answer: mot = 3 and 
15 0?R, 
tide = > aps 2 


Exercise 8.10: Dielectric Sphere. Consider a solid dielectric sphere of radius 
a and permittivity «. The sphere is placed in a electric field which is takes 
the constant value E = Ez a long distance from the sphere. Recall that 
Maxwell’s equations require that D, and Ej be continuous across the surface 
of the sphere. 


a) 


Use the expansions 
o;,. = 3 Air! P;(cos 6) 
l 


Dout = S (Bir + Cyr") P;(cos 0) 
1 


and find all non-zero coefficents A;, B;, C;. 

Show that the E field inside the sphere is uniform and of magnitude 
+; Eo. 

€+2€9 

Show that the electric field is unchanged if the dielectric is replaced by 


the polarization-induced surface charge density 


€— €0 
€ + 2€9 


Jinduced = 3€0 ( ) Eg cos @. 

(Some systems of units may require extra 47’s in this last expression. In 
SI units D = cE = eg7E+P, and the polarization-induced charge density 
is Pinduced = ye P) 


Exercise 8.11: Hollow Sphere. The potential on a spherical surface of radius 
a is ®(6,¢). We want to express the potential inside the sphere as an in- 
tegral over the surface in a manner analagous to the Poisson kernel in two 
dimensions. 


a) By using the generating function for Legendre polynomials, show that 


(oe) 


= S (2 +1)r'P(cos0), r<1 
I=0 


l-r? 
(1+ r2 — 2r cos 6)3/2 


>Our earth rotates about its axis 3654 +1 times in a year, not 3654 times. The ” +1” 
is this effect. 
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b) Starting from the expansion 


oo i 
®in(r,6,0) = > >> Aumr'¥"(9,¢) 
l=0 m=-l 
1 
Aim = = fBi"(0.0))" #(0,6) deos 0d 


and using the addition formula for spherical harmonics, show that 


2 2 , oat 
Gin(r,0,¢) = aleve) 2 aoe 6' dq! 


An r2 + a? — 2ar cos y)3/? 


where cos y = cos 6 cos 6’ + sin @ sin & cos(¢ — ¢’). 
c) By setting r = 0, deduce that a three dimensional harmonic function 
cannot have a local maximum or minimum. 


Problem 8.12: We have several times met with the Poschel-Teller eigenvalue 
problem 


dx? 
in the particular case that n = 1. We now consider this problem for any 
positive integer n. 


ad 
(-a —n(n+ 1)sech 7x) p=Ey, x 


a) Set € = tanhz in x and show that it becomes 


d d E 

(dee) 1) 4+ ——= | #=0. 

(GO-Z +nn41) +5) 

b) Compare the equation in part a) with the associated Legendre equation 
and deduce that the bound-state eigenfunctions and eigenvalues of the 
original Péschel-Teller equation are 


dm(z) = P™(tanhz), Em =—m’, m=1,...,n, 


where P’"(€) is the associated Legendre function. Observe that the list 
of bound states does not include wo = P?(tanhx) = P,(tanhx). This is 
because Wo is not normalizable, being the lowest of the unbound E > 0 
continuous-spectrum states. 

c) Now seek continuous spectrum solutions to (x) in the form 


wp(x) = e** f (tanh 2), 
and show if we take E = k?, where k is any real number, then f(€) obeys 


a-actt + 2(ik af tn(n+1)f=0. «x 
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d) Let us denote by PA) (€) the solutions of (xx) that reduce to the Legendre 
polynomial P,,(€) when k = 0. Show that the first few P&) (€) are 


PM) = 1, 
PMY) = &-Kk, 
PIE) = 5 (a6? 1 — Bikg — ¥?). 


Explore the properties of the Pe) (€), and show that they include 
i) Pe(-€) = (-1)"Pr 6). 
ii) (m+ 1) PAP, (6) = (Qn + LePa” (6) — (n+ k?/m) PY (6). 
iti) PO (1) = (1 — tk)(2 — ik)... (n — ik) /nl. 
(The pl) (€) are the vy = —y = ik special case of the Jacobi polynomials 
Pa’(6)) 


Problem 8.13: Bessel functions and impact parameters. In two dimensions we 
can expand a plane wave as 


eky — > In(kr ei”? 


n=— CoO 


a) What do you think the resultant wave will look like if we take only a 
finite segment of this sum? For example 


17 
O(a) = yo Jelena. 
1=10 
Think about: 

i) The quantum interpretation of Al as angular momentum = hkd, 
where d is the impact parameter, the amount by which the incoming 
particle misses the origin. 

ii) Diffraction: one cannot have a plane wave of finite width. 

b) After writing down your best guess for the previous part, confirm your 
understanding by using Mathematica or other package to plot the real 
part of ¢ as defined above. The following Mathematica code may work. 
Clear [bit, tot] 
bit [1_,x_,y_]:=Cos[1 ArcTan[x,y]]Bessel1J[1,Sqrt[x*2ty~2]] 
tot[x_,y_] :=Sum[bit[1,x,y],{1,10,17}] 

ContourPlot [tot [x,y] ,{x,-40,40}, {y,-40,40},PlotPoints ->200] 
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m9 


Display ["wave",%,"EPS"] 
Run it, or some similar code, as a batchfile. Try different ranges for the 
sum. 


Exercise 8.14: Consider the the two-dimensional Fourier transform 
Fc) = f e700) a 


of a function that in polar co-ordinates is of the form f(r,@) = exp{—il0} f(r). 


a) Show that 
f(k) = ariteniits [ Ji(kr) f(r) rdr, 
0 
where k, 6; are the polar co-ordinates of k. 
b) Use the inversion formula for the two-dimensional Fourier transform to 
establish the inversion formula (8.120) for the Hankel transform 


B= a J(kr) f(r) rdr. 
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Chapter 9 


Integral Equations 


A problem involving a differential equation can often be recast as one involv- 
ing an integral equation. Sometimes this new formulation suggests a method 
of attack or approximation scheme that would not have been apparent in the 
original language. It is also usually easier to extract general properties of the 
solution when the problem is expressed as an integral equation. 


9.1 Illustrations 


Here are some examples: 
A boundary-value problem: Consider the differential equation for the un- 
known u(z) 

—u" +dV(x)u = 0 (9.1) 


with the boundary conditions u(0) = u(L) = 0. To turn this into an integral 
equation we introduce the Green function 


te(y—L), O<a2<y<lL, 
ya 0.2) 
so that 
d2 
a CED = 6(x@—y). (9.3) 


Then we can pretend that AV(x)u(z) in the differential equation is a known 
source term, and substitute it for “f(a)” in the usual Green function solution. 
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We end up with 


u(x) + a G(x, y)V (y)u(y) dx = 0. (9.4) 


This integral equation for u has not not solved the problem, but is equivalent 
to the original problem. Note, in particular, that the boundary conditions 
are implicit in this formulation: if we set x = 0 or L in the second term, it 
becomes zero because the Green function is zero at those points. The integral 
equation then says that u(0) and u(L) are both zero. 

An initial value problem: Consider essentially the same differential equation 
as before, but now with initial data: 


—u" + V(ar)u=0, iO) =0; 7ik(Oh= i. (9.5) 
In this case, we claim that the inhomogeneous integral equation 
u(x) — i (x —t)V(t)u(t) dt = x, (9.6) 
0 


is equivalent to the given problem. Let us check the claim. First, the initial 
conditions. Rewrite the integral equation as 


u(z) =“2+ [i — t)V(t)u(t) dt, (9.7) 


so it is manifest that u(0) = 0. Now differentiate to get 


ul(x) =1+ / "V(t)u(t) dt. (9.8) 


This shows that u’(0) = 1, as required. Differentiating once more confirms 
that wu” = V(x)u. 

These examples reveal that one advantage of the integral equation for- 
mulation is that the boundary or initial value conditions are automatically 
encoded in the integral equation itself, and do not have to be added as riders. 


9.2 Classification of integral equations 


The classification of linear integral equations is best described by a list: 


9.2. CLASSIFICATION OF INTEGRAL EQUATIONS 349 


A) i 


) Limits on integrals fixed = Fredholm equation. 
ii) One integration limit is c = Volterra equation. 
B) i) Unknown under integral only > Type I. 
ii) Unknown also outside integral = Type II. 
C) i) Homogeneous. 
ii) Inhomogeneous. 


For example, i 
(a) = f Gle,y)ulu) dy (9.9) 


is a Type II homogeneous Fredholm equation, whilst 


u(x) = 2+ [i — t)V(t)u(t) dt (9.10) 


is a Type II inhomogeneous Volterra equation. 
The equation 


fla) = f K(e,y)utu) dy (9.11) 


an inhomogeneous Type I Fredholm equation, is analogous to the matrix 
equation 
Kx = b. (9.12) 


On the other hand, the equation 


u(x) => [ K(e,u)uly) dy, (9.13) 


a homogeneous Type II Fredholm equation, is analogous to the matrix eigen- 
value problem 
Kx = Xx. (9.14) 


Finally, 
flo) =f K(.u)uly) ay, (9.15) 


an inhomogeneous Type I Volterra equation, is the analogue of a system of 
linear equations involving an upper triangular matrix. 

The function A(x, y) appearing in these in these expressions is called the 
kernel. The phrase “kernel of the integral operator” can therefore refer either 
to the function K or the nullspace of the operator. The context should make 
clear which meaning is intended. 
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9.3 Integral transforms 


When the kernel of Fredholm equation is of the form A(« — y), with x and 
y taking values on the entire real line, then it is translation invariant and we 
can solve the integral equation by using the Fourier transformation 


u(k) = Fu) =f u(a)e*” dx (9.16) 
ey. = AGS i) i (he (9.17) 


Integral equations involving translation-invariant Volterra kernels usually 
succumb to a Laplace transform 


u(p) = clu) = fo ulae™ ae (9.18) 
ce c= / © (p)e" dp. (9.19) 


The Laplace inversion formula is the Bromwich contour integral, where 7¥ is 
chosen so that all the singularities of u(p) lie to the left of the contour. In 
practice one finds the inverse Laplace transform by using a table of Laplace 
transforms, such as the Bateman tables of integral transforms mentioned in 
the introduction to chapter 8. 

For kernels of the form K(a/y) the Mellin transform, 


uo) = M(u)= [ wayer dx (9.20) 
uit) = M1) = - ie. u(o)a° do, (9.21) 


is the tool of choice. Again the inversion formula requires a Bromwich contour 
integral, and so usually recourse to tables of Mellin transforms. 


9.3.1 Fourier methods 


The class of problems that succumb to a Fourier transform can be thought 
of a continuous version of a matrix problem where the entries in the matrix 
depend only on their distance from the main diagonal (figure 9.1). 
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| 


\N 


Figure 9.1: The matrix form of the equation [°> K(« — y)u(y) dy = f(a) 


Example: Consider the type II Fredholm equation 
u(x) — A : e lux) dx = f(x), (9.22) 
where we will assume that \ < 1/2. Here the xz-space kernel operator 
K(x —y) = 6(@ — y) — Ae #4, (9.23) 


has Fourier transform 
2N-> - shea 2A - hese 


PMS TS pags eee AE 2) 
where a? = 1— 2X. From 
kh? +a?\ _ ~ 
(FES) tH = Foy (9.25) 
we find 
re Reed oe 
u(k) = 4 f(k) 
1-a’?\ x 
Inverting the Fourier transform gives 
L-a? f? -ale-y| 
ule) = fe) fo eevi py) dy 
A ret) © Arie 
= aR a= eat PAe Sy Vay (9.27) 


This solution is no longer valid when the parameter \ exceeds 1/2. This is 
because zero then lies in the spectrum of the operator we are attempting to 
invert. The spectrum is continuous and the Fredholm alternative does not 
apply. 
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9.3.2 Laplace transform methods 


The Volterra problem 


i Key) dy = fa). Ue <a, (9.28) 


can also be solved by the application of an integral transform. In this case 
we observe that value of A(x) is only needed for positive x, and this suggests 
that we take a Laplace transform over the positive real axis. 
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Figure 9.2: We only require the value of K(x) for x positive, and u and f 
can be set to zero for x < 0. 


Abel’s equation 


As an example of Laplace methods, consider Abel’s equation 


foe [ ron (9.29) 


ty 


where we are given f(a) and wish to find u(). Here it is clear that we need 
f(0) = 0 for the equation to make sense. We have met this integral transfor- 
mation before in the definition of the “half-derivative”. It is an example of 
the more general equation of the form 


i@= / Hea ae (9.30) 


Let us take the Laplace transform of both sides of (9.30): 


i -_ U K(x — y)uly) iy) dix 


[fw [vy eK (x — y)u(y). (9.31) 


Lf (p) 
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Now we make the change of variables 
= ft 
=o: (9.32) 


ry ry 


a) b) 


| la dx 


x Xx 
a > 


Figure 9.3: Regions of integration for the convolution theorem: a) Integrating 
over y at fixed x, then over x; b) Integrating over 7 at fixed €, then over &. 


This has Jacobian 


=1, (9.33) 


/ | = E+ I (€)u(n) dé dn 
I e PK (€) dé i e Pu(n) dn 


= LK(p) Lu(p). (9.34) 


and the integral becomes 


Lf (p) 


Thus the Laplace transform of a Volterra convolution is the product of the 
Laplace transforms. We can now invert 


u=LU(Lf/LK). (9.35) 


For Abel’s equation, we have 


(9.36) 
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the Laplace transform of which is 


co | 1 
LK (p) = | x2 te? dx = p VP (5) = p/n. (9.37) 
0 
Therefore, the Laplace transform of the solution u(x) is 
1 1 
L aoe ay 1/2 eyed “1/2 
u(p) Fae (LF) = —(Vap Lf) (9.38) 
Now f(0) = 0, and so 
d 
Lp — 
pet=e(=1). (9.39) 


as may be seen by an integration by parts in the definition. Using this 
observation, and depending on whether we put the p next to f or outside the 
parenthesis, we conclude that the solution of Abel’s equation can be written 
in two equivalent ways: 


ld f? 1 pps 4 x, 
ua) == | etdy== [ rd. (9.40) 


Proving the equality of these two expressions was a problem we set ourselves 
in chapter 6. 

Here is another way of establishing the equality: Assume for the moment 
that AK (0) is finite, and that, as we have already noted, f(0) = 0. Then, 


=f K(x — y)f(y) dy (9.41) 


is equal to 
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Since K(0) cancelled out, we need not worry that it is divergent! More 
rigorously, we should regularize the improper integral by raising the lower 
limit on the integral to a small positive quantity, and then taking the limit 
to zero at the end of the calculation. 


Radon transforms 


detector 


‘ % \ \ \ 


X-ray beam 


Figure 9.4: The geometry of the CAT scan Radon transformation, showing 
the location of the point P with co-ordinates x = pcos@—tsin 6, y = psin@+ 
tcos@. 


An Abel integral equation lies at the heart of the method for reconstructing 
the image in a computer aided tomography (CAT) scan. By rotating an 
X-ray source about a patient and recording the direction-dependent shadow, 
we measure the integral of his tissue density f(x,y) along all lines in a slice 
(which we will take to be the x,y plane) through his body. The resulting 
information is the Radon transform F of the function f. If we parametrize 
the family of lines by p and @, as shown in figure 9.4, we have 


F(,0).. = / f(pcosé — tsin6, psin 6 + t cos @) dt, 


= | d(acos@ + ysin 6 — p) f(x, y) dxdy. (9.43) 
R2 
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We will assume that f is zero outside some finite region (the patient), and 
so these integrals converge. 

We wish to invert the transformation and recover f from the data F'(p, 0). 
This problem was solved by Johann Radon in 1917. Radon made clever use 
of the Euclidean group to simplify the problem. He observed that we may 
take the point O at which we wish to find f to be the origin, and defined! 


ary | 
/ tf. d(x cosé + ysind — p) f(x,y) tay} dé. (9.44) 


~ On 
Thus Fo(p) is the angular average over all lines tangent to a circle of ra- 
dius p about the desired inversion point. Radon then observed that if he 
additionally defines 


Fo(p) 


_ 1 20 
Vie be alk f(rcos@,r sin ¢) d@ (9.45) 


then he can substitute f(r) for f(x,y) in (9.44) without changing the value 
of the integral. Furthermore f(0) = f(0,0). Hence, taking polar co-ordinates 
in the x,y plane, he has 


an oe - 
Fo(p) = =| tf d(rcos¢@cos# + rsin dsind - pir) rar} dé. 
er (9.46) 
We can now use F 
) = ——— 6(¢ — on), 9.47 
(9(¢)) d. Wir? > om) (9.47) 


where the sum is over the zeros ¢, of g(¢) = rcos(0 — ¢) — p, to perform the 
@ integral. Any given point x = rcos¢, y = rsin¢ lies on two distinct lines 
if and only if p < r. Thus g(@) has two zeros if p < r, but none if r < p. 
Consequently 


| al ae ce ee 
Fo(r) = x f i Sreegliryrt ha (9.48) 


Nothing in the inner integral depends on @. The outer integral is therefore 
trivial, and so 
ry rar. (9.49) 


Fo(p) ae Werad 


‘We trust that the reader will forgive the anachronism of our expressing Radon’s for- 
mulze in terms of Dirac’s delta function. 
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We can extract Fo(p) from the data. We could therefore solve the Abel 
equation (9.49) and recover the complete function f(r). We are only inter- 
ested in f(0), however, and it easier verify a claimed solution. Radon asserts 
that 


#(0,0) = F(0) = —— i “= (5 Fol) dp. (9.50) 


us Pp 
To prove that his claim is true we must first take the derivative of Fo(p) and 


show that 
(70) = [ EG (gfe) & (9.51) 


The details of this computation are left as an exercise. It is little different 
from the differentiation of the integral transform at the end of the last section. 
We then substitute (9.51) into (9.50) and evaluate the resulting integral 


7 -= i. if aS (F70)) w| dp (9.52) 


by exchanging the order of the integrations, as shown in figure 9.5. 


A =r, A =r 
i) Pp P b) P Pp 


Tr a 


Figure 9.5: a) In (9.52) we integrate first over r and then over p. The inner r 
integral is therefore from r = p to r = oo. b) In (9.53) we integrate first over 
p and then over r. The inner p integral therefore runs from p= 0 to p=r. 


After the interchange we have 


i -=[" Leeper (F707) dr. (9.53) 


(9.54) 


Since 
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the inner integral is independent of r. We thus obtain 


ee [ (F 70) dr = F(0) = f(0,0). (9.55) 


Radon’s inversion formula is therefore correct. 

Although Radon found a closed-form inversion formula, the numerical 
problem of reconstructing the image from the partial and noisy data obtained 
from a practical CAT scanner is quite delicate, and remains an active area 
of research. 


9.4 Separable kernels 


Let 


K(a,y) = > Pile)ai(y), (9.56) 


where {p;} and {q;} are two linearly independent sets of functions. The 
range of K is therefore the span (p;) of the set {p;}. Such kernels are said 
to be separable. The theory of integral equations containing such kernels is 
especially transparent. 


9.4.1 Eigenvalue problem 


Consider the eigenvalue problem 


Au(z) = i K(2,y)u(y) dy (9.57) 


for a separable kernel. Here, D is some range of integration, and x € D. If 
\ #0, we know that u has to be in the range of A’, so we can write 


u(x) = > ile). (9.58) 


Inserting this into the integral, we find that our problem reduces to the finite 
matrix eigenvalue equation 


AE; = Aig&;, (9.59) 
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where 
Ay = i: gi(y)Pi(y) dy. (9.60) 


Matters are especially simple when q; = p;. In this case A;; = Aj;, so the 
matrix A is Hermitian and has N linearly independent eigenvectors. Further, 
none of the N associated eigenvalues can be zero. To see that this is so 
suppose that u(x) = )°,¢ip;(x) is an eigenvector with zero eigenvalue. In 


other words, suppose that 
0= pile) f vetudps(wss ay. (9.61) 
F D 
Since the p;(x) are linearly independent, we must have 


0= ih ot (y)p,(u)j dy = 0, (9.62) 


for each 7 separately. Multiplying by ¢* and summing we find 


o- f Dd Pily)Ss 


and so u() itself must have been zero. The remaining (infinite in number) 
eigenfunctions span (g;)+ and have \ = 0. 


diy i; low)? dy, (9.63) 


9.4.2 Inhomogeneous problem 


It is easiest to discuss inhomogeneous separable-kernel problems by example. 
Consider the equation 


u(x) = f(a) +p / K (a, y)uly) dy, (9.64) 


where K(x,y) = xy. Here, f(x) and p are given, and u(x) is to be found. 
We know that u(2) must be of the form 
u(x) = f(x) + az, (9.65) 


and the only task is to find the constant a. We plug wu into the integral 
equation and, after cancelling a common factor of x, we find 


1 1 1 
a 7 2, 
a= yuly) a= f yf (y) ay+an f y° dy. (9.66) 
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The last integral is equal to ya/3, so 


1 1 
a (1 = 5H) = | yf (y) dy, (9.67) 
0 
and finally 


it 1 
u(a) = fle) +eq— ef uftuy a (9.68) 
Notice that this solution is meaningless if ~ = 3. We can relate this to the 
eigenvalues of the kernel K (x,y) = xy. The eigenvalue problem for this 
kernel is 


ute) = | xryu(y) dy. (9.69) 


On substituting u(az) = az, this reduces to AXax = ax/3, and so A = 1/3. All 
other eigenvalues are zero. Our inhomogeneous equation was of the form 


(1-pkK)u=f (9.70) 


and the operator (1—j/’) has an infinite set of eigenfunctions with eigenvalue 
1, and a single eigenfunction, uo(x) = x, with eigenvalue (1 — y/3). The 
eigenvalue becomes zero, and hence the inverse ceases to exist, when f= 3. 

A solution to the problem (1—yk)u = f may still exist even when pz = 3. 
But now, applying the Fredholm alternative, we see that f must satisfy the 
condition that it be orthogonal to all solutions of (1 — wk’)'v = 0. Since our 
kernel is Hermitian, this means that f must be orthogonal to the zero mode 
Uo(x) = x. For the case of sz = 3, the equation is 


i) = F438 / ayaly) a, (9.71) 


and to have a solution f must obey ie yf(y) dy = 0. We again set u = 
f(a) + ax, and find 


1 1 
a=3 | yflu)dy+a3 | y?dy, (9.72) 
0 0 
but now this reduces to a = a. The general solution is therefore 
u= f(x)+az (9.73) 


with a arbitrary. 
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9.5 Singular integral equations 


Equations involving principal-part integrals, such as the airfoil equation 


1 
=f (0) ae = fy), (9.74) 
T Jy x-y 
in which f is given and we are to find y, are called singular integral equations. 
Their solution depends on what conditions are imposed on the unknown 
function v(x) at the endpoints of the integration region. We will consider 
only this simplest example here? 


9.5.1 Solution via Tchebychef polynomials 
Recall the definition of the Tchebychef polynomials from chapter 2. We set 


gH Gs me cos(n cos" 2), (9.75) 
UG = ee = oT) (9.76) 


These are the Tchebychef Polynomials of the first and second kind, respec- 
tively. The orthogonality of the functions cosné and sin né over the interval 
(0, 7] translates into 


1 
1 
I AOE = fins. eit 0, 9.77 
re EE!) re) 
where hy = 7, hyn = 7/2, n > 0, and 


1 
/ V1 — 2? U,_1(2) Um_i(x) dx = = bam: nan > 0. (9.78) 
= 


The sets {T;,(x)} and {U,,(x)} are complete in L?,[0, 1] with the weight func- 
tions w = (1 — 2?)-/? and w = (1 — 2”), respectively . 

Rather less obvious are the principal-part integral identities (valid for 

-l<y<1) 
i 

1 1 

P ‘i aaa 

1Vl-a2?24-y 

P / Tate See 
i n Xx 
a = z-y 


?The classic text is N. I. Muskhelishvili Singular Integral Equations. 


dx = 0, (9.79) 


de = - Up y SO, (9.80) 
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and 


1 
1 

Pi V1 = 2° Un-1(2)— 
=| _ 


These correspond, after we set x = cos@ and y = cos @, to the trigonometric 


; dx =-1T,(y), n> 0. (9.81) 


integrals 
"cos né sin nd 
P ———— dj= 9.82 
/ cos 8 — cos@ Tsing’ 82) 
and tisha Ndicias 
sin # sin n 
P | ——d=- 9. 
i, a0 = aed T cos ng, (9.83) 


respectively. We will motivate and derive these formule at the end of this 
section. 

Granted the validity of these principal-part integrals we can solve the 
integral equation 


= f oa) ae =f), y€l[-1], (9.84) 


for y in terms of f, subject to the condition that y be bounded at « = +1. 
We show that no solution exists unless f satisfies the condition 


1 
——— f(x) dx = 0, 9.85 
l=! (9.85) 
but if f does satisfy this condition then there is a unique solution 
er fpf («)- : a (9.86) 
—— dr. : 
V1 — a! 


To understand why this is the solution, and why there is a condition on f, 
expand 


gly) = 


= 3 GL le): (9.87) 


Here, the condition on f translates into the absence of a term involving 
Ty = 1 in the expansion. Then, 


g(a) = —V1 = a? Sb, Un-1(2), (9.88) 


n=1 
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with b, the coefficients that appear in the expansion of f, solves the problem. 
That this is so may be seen on substituting this expansion for y into the 
integral equation and using second of the principal-part identities. This 
identity provides no way to generate a term with 7p; hence the constraint. 
Next we observe that the expansion for y is generated term-by-term from 
the expansion for f by substituting this into the integral form of the solution 
and using the first principal-part identity. 
Similarly, we solve the for y(y) in 


= | va) ae =fty), y€[-1,¥, (9.89) 


where now y is permitted to be singular at x = +1. In this case there is 
always a solution, but it is not unique. The solutions are 


1 ON eee 1 C 


where C’ is an arbitrary constant. To see this, expand 


[@= > GUn i): (9.91) 


and then 


1 (oe) 
g(a) = oe (>: An T(x) + cn) ; (9.92) 


n=1 
satisfies the equation for any value of the constant C. Again the expansion 
for y is generated from that of f by use of the second principal-part identity. 


Explanation of the principal-part identities 


The principal-part identities can be extracted from the analytic properties of 
the resolvent operator R)(n — n’) = (H — AT Je for a tight-binding model 


of the conduction band in a one-dimensional crystal with nearest neighbour 
hopping. The eigenfunctions uz(n) for this problem obey 


Up(n+ 1) 4+ ug(n—1) = Fug(n) (9.93) 


and are . 
up(n) =e", -t1<O0<7, (9.94) 
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with energy eigenvalues FE = 2 cos 6. 
The resolvent R,(n) obeys 


Ry(n+1)+ Ry(n—1) —AR)(n) = bn, 1 €Z, (9.95) 


and can be expanded in terms of the energy eigenfunctions as 


— hy Sr tela’) _ 7. sere 
Fkn ie E-d  J_,2cos@— 2a i9-26) 


If we set A = 2.cos ¢, we observe that 


- en dé 1 : 
a ele. aI > 0. 9.97 
l= Q0r Disind fen? at) 


That this integral is correct can be confirmed by observing that it is evalu- 
ating the Fourier coefficient of the double geometric series 


= Sg 21 sin @ 
ind vilnjp Imod > 0. 9.98 
ys eee 2cos@ —2cos¢’ g ( ) 


n=—CO 


By writing e’”? = cosn@+isin né and observing that the sine term integrates 


to zero, we find that 


™ — cosné T 
ff S , Sl 9.99 
i aay Fi an ao no +isinn¢), (9.99) 


where n > 0, and again we have taken Im@ > 0. Now let @ approach the 
real axis from above, and apply the Plemelj formula. We find 


* cos nd sin nd 
Nag ———— dj= : 9.100 
i cos 6 — cos @ n sin ( ) 
This is the first principal-part integral identity. The second identity, 
* sin é sin nd 
Po Se pe : 9.101 
i: cos 8 — cos Tenn? ( ) 


is obtained from the the first by using the addition theorems for the sine and 
cosine. 
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9.6 Wiener-Hopf equations I 


We have seen that Volterra equations of the form 


[ k@-vuay= fo) 0 <9 < 00, (9.102) 


having translation invariant kernels, may be solved for u by using a Laplace 
transform. The apparently innocent modification 


| K(a—y) u(y) dy = f(z), 0O<24%< co (9.103) 
0 
leads to an equation that is much harder to deal with. In these Wiener- 


Hopf equations, we are still only interested in the upper left quadrant of the 
continuous matrix K(x — y) 


) 


Bh la an ean en 


-) 
=e) 
jaz) 
>) 


Figure 9.6: The matrix form of (9.103). 


and K(x — y) still has entries depending only on their distance from the 
main diagonal. Now, however, we make use of the values of A(x) for all of 
—oo < x < oo. This suggests the use of a Fourier transform. The problem is 
that, in order to Fourier transform, we must integrate over the entire real line 
on both sides of the equation and this requires us to to know the values of 
f(x) for negative values of x — but we have not been given this information 
(and do not really need it). We therefore make the replacement 


f(x) > f(@) + 9(@), (9.104) 


where f(x) is non-zero only for positive x, and g(x) non-zero only for negative 
x. We then solve 


[ K (a — y)u(y) dy = { Ae a (9.105) 
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so as to find u and g simultaneously. In other words, we extend the problem 
to one on the whole real line, but with the negative-x source term g(a) chosen 
so that the solution u(x) is non-zero only for positive x. We represent this 
pictorially in figure 9.7. 


= AX 


IS 


y : 


WS 8 


Figure 9.7: The matrix form of (9.105) with both f and g 


© 


To find u and g we try to make an “LU” decomposition of the matrix K into 
the product K = L~'U of an upper triangular matrix U(x — y) and a lower 
triangular matrix L~'(x — y). Written out in full, the product L~'U is 


K(x@-y)= [ L\(a — t)U(t — y) dt. (9.106) 


Now the inverse of a lower triangular matrix is also lower triangular, and so 
L(a — y) itself is lower triangular. This means that the function U(x) is zero 
for negative x, whilst L(a) is zero when x is positive. 


—1 


N 
\ 


" Xx 
«  « 


\ \ 


\ 


Yy 


ff Ver 


Figure 9.8: The matrix decomposition K = L~'U. 


If we can find such a decomposition, then on multiplying both sides by L, 
equation (9.103) becomes 


i. U(x — y)u(y) dy= h(x), 0O<4<@, (9.107) 
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where 


h(x) [ Reade Wee ee. (9.108) 


8 


These two equations come from the upper half of the full matrix equation 
represented in figure 9.9. 


ws] [FZ 


em) 
=) 


Figure 9.9: The equation (9.107) and the definition (9.108) correspond to 
the upper half of these two matrix equations. 


The lower parts of the matrix equation have no influence on (9.107) and 
(9.108): The function h(x) depends only on f, and while g(x) should be 
chosen to give the column of zeros below h, we do not, in principle, need to 
know it. This is because we could solve the Volterra equation Uu = h (9.107) 
via a Laplace transform. In practice (as we will see) it is easier to find g(x), 
and then, knowing the (f,g) column vector, obtain u(«) by solving (9.105). 
This we can do by Fourier transform. 

The difficulty lies in finding the LU decomposition. For finite matrices 
this decomposition is a standard technique in numerical linear algebra. It 
equivalent to the method of Gaussian elimination, which, although we were 
probably never told its name, is the strategy taught in high school for solving 
simultaneous equations. For continuously infinite matrices, however, making 
such a decomposition demands techniques far beyond those learned in school. 
It is a particular case of the scalar Riemann-Hilbert problem, and its solution 
requires the use of complex variable methods. 

On taking the Fourier transform of (9.106) we see that we are being asked 
to factorize 


K(k) = [L(k)]-1U(k) (9.109) 
where 


Uh) = [ e**U (x) dr (9.110) 


368 CHAPTER 9. INTEGRAL EQUATIONS 


is analytic (7.e. has no poles or other singularities) in the region Imk > 0, 


and similarly 
0 


L(k) = / e** E(x) dx (9.111) 
has no poles for Ink < 0, these analyticity conditions being consequences 
of the vanishing conditions U(# — y) = 0, x < y and Liz —y) =0, 24> y. 
There will be more than one way of factoring K into functions with these 
no-pole properties, but, because the inverse of an upper or lower triangu- 
lar matrix is also upper or lower triangular, the matrices U~'(x — y) and 
L~'(a — y) have the same vanishing properties, and, because these inverse 
matrices correspond the to the reciprocals of the Fourier transform, we must 
also demand that U(k) and L(k) have no zeros in the upper and lower half 
plane respectively. The combined no-poles, no-zeros conditions will usually 
determine the factors up to constants. If we are able to factorize K(k) in 
this manner, we have effected the LU decomposition. When K (k) is a ratio- 
nal function of k we can factorize by inspection. In the general case, more 
sophistication is required. 

Example: Let us solve the equation 


u(x) — af e-Ylu(x) dx = f(x), (9.112) 
0 
where we will assume that \ < 1/2. Here the kernel function is 


K(a,y) = 6(@ — y) — re PY. (9.113) 


This has Fourier transform 


S 20 k24(1-2\)  (k+ia\ (k-i\™ 
a a ese . (9.114 
ss ae a k2 41 (=) (SS) ae 


where a? = 1 — 2\. We were able to factorize this by inspection with 


~ k+ia = k—1 


k-—ia 


having poles and zeros only in the lower (respectively upper) half-plane. We 
could now transform back into x space to find U(x — y), L(a — y) and solve 
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the Volterra equation Uu = h. It is, however, less effort to work directly 
with the Fourier transformed equation in the form 


(SS )am=- (EE) Gorm). ono 


k+4 
Here we have placed subscripts on f(k), g(k) and u(k) to remind us that these 
Fourier transforms are analytic in the upper (+) or lower (-) half-plane. Since 
the left-hand-side of this equation is analytic in the upper half-plane, so must 
be the right-hand-side. We therefore choose g_(k) to eliminate the potential 
pole at k = 7a that might arise from the first term on the right. This we can 


do by setting 
k-%i a 
(==) g-(k) = k —ia eat) 


for some as yet undetermined constant a. (Observe that the resultant g_(k) 
is indeed analytic in the lower half-plane. This analyticity ensures that g(x) 
is zero for positive x.) We can now solve for u(k) as 


= k+1i k-t\ ~ k+1i a 
ce es (5) (f=) F(R) + (=) hia 
R41 ~ k+i 
~ 2 + ai +tk) OR + a? 
~ 1—a’® x k k+i 
= F(k) + poate )+ or (9.118) 
The inverse Fourier transform of 
k+i 
mee 9.119 
k2 + a? ( ) 
is . 
— (1 — Jal sen(x))e7 Helle, (9.120) 
2|a| 
and that of ; 5 ms 
—a 
SS 9.121 
k?+a2 k?+ (1-22) ( ) 
is \ 
gn real: (9.122) 


V1—2A 
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Consequently 


7 Af ovr Re-y 
ula) = fla) + ee fey) ty 


+6(1 — V1 — 2\sgn z)e"V "1, (9.123) 


Here @ is some multiple of a, and we have used the fact that f(y) is zero 
for negative y to make the lower limit on the integral 0 instead of —oo. We 
determine the as yet unknown ( from the requirement that u(x) = 0 for 
x <0. We find that this will be the case if we take 


a 5 fers) ay. (9.124) 


P= Get) 


The solution is therefore, for x > 0, 


A © o- VIF Alea 
= x d 
— / e f(y) dy 
pala act aa e V1?A9 Fy) dy. (9.125) 
a Ce ey ; 


Not every invertible n-by-n matrix has a plain LU decomposition. For a 
related reason not every Wiener-Hopf equation can be solved so simply. In- 
stead there is a topological index theorem that determines whether solutions 
can exist, and, if solutions do exist, whether they are unique. We shall there- 
fore return to this problem once we have aquired a deeper understanding of 
the interaction between topology and complex analysis. 


ule) = f(x)+ 


9.7 Some functional analysis 


We have hitherto avoided, as far as it is possible, the full rigours of mathe- 
matics. For most of us, and for most of the time, we can solve our physics 
problems by using calculus rather than analysis. It is worth, nonetheless, be- 
ing familiar with the proper mathematical language so that when something 
tricky comes up we know where to look for help. The modern setting for 
the mathematical study of integral and differential equations is the discipline 
of functional analysis, and the classic text for the mathematically inclined 
physicist is the four-volume set Methods of Modern Mathematical Physics by 
Michael Reed and Barry Simon. We cannot summarize these volumes in few 
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paragraphs, but we can try to provide enough background for us to be able 
to explain a few issues that may have puzzled the alert reader. 

This section requires the reader to have sufficient background in real 
analysis to know what it means for a set to be compact. 


9.7.1 Bounded and compact operators 


i) 


iii) 


A linear operator K : L? — L? is bounded if there is a positive number 
M such that 
Ka] < M|zl], Ve € DL. (9.126) 


If AK is bounded then smallest such M is the norm of K, which we 
denote by ||A’|| . Thus 


|||] < || AT] [ler]. (9.127) 


For a finite-dimensional matrix, ||K'|| is the largest eigenvalue of K. 
The function Ka is a continuous function of x if, and only if, it is 
bounded. “Bounded” and “continuous” are therefore synonyms. Linear 
differential operators are never bounded, and this is the source of most 
of the complications in their theory. 

If the operators A and B are bounded, then so is AB and 


|| ABl] < [AMI (9.128) 


A linear operator K : L? — L? is compact (or completely continuous) 
if it maps bounded sets in L? to relatively compact sets (sets whose 
closure is compact). Equivalently, K is compact if the image sequence 
Kap of every bounded sequence of functions x, contains a convergent 
subsequence. Compact = continuous, but not vice versa. One can 
show that, given any positive number M, a compact self-adjoint oper- 
ator has only a finite number of eigenvalues with outside the interval 
[—M, M]. The eigenvectors u,, with non-zero eigenvalues span the range 
of the operator. Any vector can therefore be written 


U=Uo + S- Aj Us, (9.129) 


where Ug lies in the null space of K. The Green function of a linear 
differential operator defined on a finite interval is usually the integral 
kernel of a compact operator. 
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iv) If kK is compact then 
Horek (9.130) 


is Fredholm. This means that H has a finite dimensional kernel and 
co-kernel, and that the Fredholm alternative applies. 
v) An integral kernel is Hilbert-Schmidt if 


[KEMP acan < 00. (9.131) 


This means that K can be expanded in terms of a complete orthonormal 


set {@m} as 


K(2,y) = S~> Anmon(x)o%(y) (9.132) 
nm=1 
in the sense that 
N,M 
yim py AnmOn?, — | = 0. (9.133) 
Now the finite sum 
N,M 
SS AnmOn (2) Oe (Y) (9.134) 
nm=1 


is automatically compact since it is bounded and has finite-dimensional 
range. (The unit ball in a Hilbert space is relatively compact = the 
space is finite dimensional). Thus, Hilbert-Schmidt implies that K is 
approximated in norm by compact operators. But it is not hard to 
show that a norm-convergent limit of compact operators is compact, 
so K itself is compact. Thus 


Hilbert-Schmidt = compact. 


It is easy to test a given kernel to see if it is Hilbert-Schmidt (simply 
use the definition) and therein lies the utility of the concept. 
If we have a Hilbert-Schmidt Green function g, we can reacast our differen- 
tial equation as an integral equation with g as kernel, and this is why the 
Fredholm alternative works for a large class of linear differential equations. 
Example: Consider the Legendre-equation operator 
d d 


L=—-—(1-27)— 
Cee 


9.135 
re (9.135) 
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acting on functions u € L?[—1, 1] with boundary conditions that u be finite 
at. the endpoints. This operator has a normalized zero mode up = 1/V2, so 
it cannot have an inverse. There exists, however, a modified Green function 
g(x, x’) that satisfies 


1 
Lu = 6(4 — 2’) — 5 (9.136) 
It is , 4 
g(a, x") = In2— 57g md +25)0 — Le), (9.137) 


where xy is the greater of x and 2’, and xz the lesser. We may verify that 
1 pl 
/ \o(a, x’)|* drdx’ < 00, (9.138) 
-W-1 


so g is Hilbert-Schmidt and therefore the kernel of a compact operator. The 
eigenvalue problem 
Lin, = Antin (9.139) 


can be recast as as the integral equation 


1 
pntin = f g(a, x')Un(2’) da (9.140) 


au 


with pt, = AZ'. The compactness of g guarantees that there is a complete 
set of eigenfunctions (these being the Legendre polynomials P,,(xz) for n > 0) 
having eigenvalues fu, = 1/n(n+1). The operator g also has the eigenfunction 
Po with eigenvalue 49 = 0. This example provides the justification for the 
claim that the “finite” boundary conditions we adopted for the Legendre 
equation in chapter 8 give us a self adjoint operator. 

Note that K(z,y) does not have to be bounded for AK to be Hilbert- 
Schmidt. 
Example: The kernel 


1 
K(z,y) = ——_ (I[2L ll <1 (9.141) 
(x — y) 
is Hilbert-Schmidt provided a < 5. 
Example: The kernel 
1 
K(az,y)=—e™*-|, 2,yER (9.142) 
2m 
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is not Hilbert-Schmidt because |A(x — y)| is constant along the the lines 
x — y = constant, which lie parallel to the diagonal. AK has a continuous 
spectrum consisting of all positive real numbers less than 1/m?. It cannot 
be compact, therefore, but it is bounded with ||A|| = 1/m?. The integral 
equation (9.22) contains this kernel, and the Fredholm alternative does not 
apply to it. 


9.7.2 Closed operators 


One motivation for our including a brief account of functional analysis is 
that an attentive reader will have realized that some of the statements we 
have made in earlier chapters appear to be inconsistent. We have asserted in 
chapter 2 that no significance can be attached to the value of an L? function 
at any particular point — only integrated averages matter. In later chapters, 
though, we have happily imposed boundary conditions that require these 
very functions to take specified values at the endpoints of our interval. In this 
section we will resolve this paradox. The apparent contradiction is intimately 
connected with our imposing boundary conditions only on derivatives of lower 
order than than that of the differential equation, but understanding why this 
is so requires some function-analytic language. 

Differential operators L are never continuous; we cannot deduce from 
Un > u that Lu, — Lu. Differential operators can be closed, however. A 
closed operator is one for which whenever a sequence u,, converges to a limit 
u and at the same time the image sequence Lu,, also converges to a limit f, 
then wu is in the domain of L and Lu = f. The name is not meant to imply 
that the domain of definition is closed, but indicates instead that the graph 
of L — this being the set {u, Lu} considered as a subset of L?/a, b] x L?{a, b] 
— contains its limit points and so is a closed set. 

Any self-adjoint operator is automatically closed. To see why this is so, 
recall that in the defining the adjoint of an operator A, we say that y is in the 
domain of A’ if there is a z such that (y, Av) = (z,x) for all x in the domain 
of A. We then set A'y = z. Now suppose that y, — y and Aly, = z, — z. 
The Cauchy-Schwartz-Bunyakovski inequality shows that the inner product 
is a continuous function of its arguments. Consequently, if x is in the domain 
of A, we can take the limit of (yn, Av) = (Alyn, x) = (Zn, x) to deduce that 
(y, Ax) = (z,x). But this means that y is in the domain of A‘, and z = Aly. 
The adjoint of any operator is therefore a closed operator. A self-adjoint 
operator, being its own adjoint, is therefore necessarily closed. 
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A deep result states that a closed operator defined on a closed domain is 
bounded. Since they are always unbounded, the domain of a closed differen- 
tial operator can never be a closed set. 


An operator may not be closed but may be closable, in that we can 
make it closed by including additional functions in its domain. The essential 
requirement for closability is that we never have two sequences uy, and vy, 
which converge to the same limit, w, while Lu, and Lv, both converge, but 
to different limits. Closability is equivalent to requiring that if u,, — 0 and 
Lu, converges, then Lu, converges to zero. 


Example: Let L = d/dx. Suppose that u, — 0 and Lu, — f. If yisa 
smooth L? function that vanishes at 0,1, then 


1 fi os 1 
pf dx = lim a dx=—lim | ¢'undzx = 0. (9.143) 
r T 


Here we have used the continuity of the inner product to justify the inter- 
change the order of limit and integral. By the same arguments we used when 
dealing with the calculus of variations, we deduce that f = 0. Thus d/dz is 
closable. 


If an operator is closable, we may as well add the extra functions to its 
domain and make it closed. Let us consider what closure means for the 
operator 


= a D(L) = {y € C’[0, 1] : y'(0) = O}. (9.144) 


Here, in fixing the derivative at the endpoint, we are imposing a boundary 
condition of higher order than we ought. 


Consider the sequence of differentiable functions yz shown in figure 9.10. 
These functions have vanishing derivative at « = 0, but tend in L? to a 
function y whose derivative is non-zero at 7 = 0. 
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% y 


an ie 


Figure 9.10: limg_.9 Ya = y in L?(0, 1] . 


Figure 9.11 shows that derivative of these functions also converges in L?. 


Figure 9.11: y/, — y’ in L7(0,1) . 


If we want L to be closed, we should therefore extend the domain of definition 
of L to include functions with non-vanishing endpoint derivative. We can also 
use this method to add to the domain of L functions that are only piecewise 
differentiable — i.e. functions with a discontinuous derivative. 

Now consider what happens if we try to extend the domain of 


L=—, D(L)={y,y' € L?: y(0)=0}, (9.145) 


to include functions that do not vanish at the endpoint. Take a sequence the 
sequence of functions yg shown in figure 9.12. These functions vanish at the 
origin, and converge in L? to a function that does not vanish at the origin. 
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Ya ty 


Figure 9.12: limgo Ya = y in L?(0, 1]. 


Now, as figure 9.13 shows, the derivatives converge towards the derivative 
of the limit function — together with a delta function near the origin. The 
area under the functions |y/ (x)|? grows without bound and the sequence Ly, 
becomes infinitely far from the derivative of the limit function when distance 
is measured in the L? norm. 


6(x) 
T/a 


a 


Figure 9.13: y/, — 6(x), but the delta function is not an element of L?|0, 1] . 


We therefore cannot use closure to extend the domain to include these func- 
tions. Another way of saying this is, that in order for the weak derivative of 
y to be in L?, and therefore for y to be in the domain of d/dz, the function y 
need not be classically differentiable, but its L? equivalence class must con- 
tain a continuous function — and continuous functions do have well-defined 
values. It is the values of this continuous representative that are constrained 
by the boundary conditions. 

This story repeats for differential operators of any order: If we try to 
impose boundary conditions of too high an order, they are washed out in the 
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process of closing the operator. Boundary conditions of lower order cannot 
be eliminated, however, and so make sense as statements involving functions 
72 
in L’. 


9.8 Series solutions 


One of the advantages of recasting a problem as an integral equation, is that 
the equation often suggests a systematic approximation scheme. Usually we 
start from the solution of an exactly solvable problem and expand the desired 
solution about it as an infinite series in some small parameter. The terms in 
such a perturbation series may become progressively harder to evaluate, but, 
if we are lucky, the sum of the first few will prove adaquate for our purposes. 


9.8.1 Liouville-Neumann-Born series 


The geometric series 
S=l-gx+2°-2°4+--- (9.146) 


converges to 1/(1 +x) provided |x| < 1. Suppose we wish to solve 
I+rAK)p=f (9.147) 
where K is a an integral operator. It is then natural to write 
g=(I+\K)'f =(-AK+ °K? —- Ki +---)f (9.148) 
where 
Rey) =f KeAK Gude Key) = f Ke) Ka, 22) (a0) dade 
(9.149) 
and so on. This Liouville-Neumann series will converge, and yield a solution 


to the problem, provided that A||A|| < 1. In quantum mechanics this series 
is known as the Born series. 


9.8.2 Fredholm series 


A familiar result from high-school algebra is Cramer’s rule, which gives the 
solution of a set of linear equations in terms of ratios of determinants. For 
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example, the system of equations 


44121 + Ay2%2 + 013%3 = bi, 


491 XL1 + Ag2X2 + Ar3X3 = bo, 
43121 + A322 + 3373 = bs, 


has solution 


1 bi aie Q13 1 ay, by Q13 
Li =—|b2 ao G3], L2=—= |G bo az], 
Pb 2 b 
3 432 433 31 3 433 
where 
a4, 412 443 
D= a2, G22 a3}. 
Q31 432 433 


31 


ai 
a21 


379 
(9.150) 
aig by 
ao be ; 
a32 bs 
(9.151) 
(9.152) 


Although not as computationally efficient as standard Gaussian elimination, 
Cramer’s rule is useful in that it is a closed-form solution. It is equivalent to 
the statement that the inverse of a matrix is given by the transposed matrix 


of the co-factors, divided by the determinant. 


A similar formula for integral equations was given by Fredholm. The 


equations he considered were, in operator form 


(I+AK)p = f. 


(9.153) 


Where J is the identity operator, K is an integral operator with kernel 
K(az,y), and \ a parameter We motivate Fredholm’s formula by giving an 
expansion for the determinant of a finite matrix. Let K be an n-by-n matrix 


14+ AKu AK 42 
: MKy 14K. 
D(A) © det (I+ AK) = a fas 
AKni AK n2 
Then 
n a 
a4 
m=0 
where Ap = 1, AV =trK =>, Ku, 
7 “ Ke: 
Kini Kini - ae 
= pe an ae Se: 2 pit 


i1,i2,i3=1 | KX; 


i3tt 


AKin 

AK an 
1+ AKnn 
Kass Kons 
It Kgis 

K 


i3t2 


(9.154) 


(9.155) 


(9.156) 
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The pattern for the rest of the terms should be obvious, as should the proof. 

As observed above, the inverse of a matrix is the reciprocal of the deter- 
minant of the matrix multiplied by the transposed matrix of the co-factors. 
So, if Duy is the co-factor of the term in D(A) associated with K,,, then the 


solution of the matrix equation 


(I+AK)x=b (9.157) 
. Do RD gio yb 
pli p22 Ne pnl¥n 
a ee We 9.158 
by — (9.158) 
If 4 Av we have 
a gl) Kyi Kyis 
a1t2 igv i2t1 igi2 
(9.159) 
When ps = v we have 2 
D =O gD) (9.160) 


where D(A) is the expression analogous to D(A), but with the p’th row and 


column deleted. 


These elementary results suggest the definition of the Fredholm determi- 


nant of the integral kernel K(z,y), a< x,y < b, as 


(oe) \m 
D(A) = Det |I +. AK] = > — Am; 
mM. 


m=0 
where Ay = 1, A, = Tr K = es K (a; x) dz; 


See K (21, 22) 


a= K (22, 21) K (x2, £2) 


dx\dx2, 


K(21,%2) K(x, 23) 


b pb pb oe 1,42 
a= [ff |K fee r1) K (x9, £2) K (x2, £3) dxydx2dx3. 
ada r1) 


K (2x3, £2) K (x3, £3) 
etc. We also define 


b 
D(z,y,A) = AK(a.y) +42 | 


(9.161) 


(9.162) 
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K(&,,y) K(&,&) K(&, €)|d&idf2+---, 


ae K(2,y) K (a, &1) K (a, €2) 
BRE I (Eat) A Spey A (hay C5) 


(9.163) 
and then 
l b 
(0) = Fla) + ary ff Deen AW) dy (9.164) 
is the solution of the equation 
b 
ote) + | K@.)elu) dy = F0). (9.165) 


If |K (a, y)| < M in [a, }] x [a, 6], the Fredholm series for D(X) and D(z, y, d) 
converge for all A, and define entire functions. In this feature it is unlike the 
Neumann series, which has a finite radius of convergence. 

The proof of these claims follows from the identity 


b 
D(e.yd) + ADK (0,y) +A f Deg NK Eu) aE =0, (9-166) 
or, more compactly with G(z, y) = D(z, y,A)/D(A), 


(T+ QI 41K) =T. (9.167) 


For details see Whitaker and Watson §11.2. 
Example: The equation 


p(t) =“2+A i xyp(y) dy (9.168) 
gives us 
D(\) =1- as Diya (9.169) 
and so 5 
(x) = = (9.170) 


(We have considered this equation and solution before, in section 9.4 ) 
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9.9 Further exercises and problems 


Exercise 9.1: The following problems should be relatively easy. 


a) Solve the inhomogeneous type II Fredholm integral equation 


1 
u(a%) = e* + af xy u(y) dy. 


b) Solve the homogeneous type II Fredholm integral equation 


alc) = af sin(a — y) u(y) dy. 


c) Solve the integral equation 


1 
ula) = 2+ / (yor + 9) u(y) dy 


to second order in X using 
(i) the Neumann series; and 
(ii) the Fredholm series. 
d) By differentiating, solve the integral equation: u(«) = «+ {> u(y) dy. 
Solve the integral equation: u(a) = 2? + ie xy u(y) dy. 
f) Find the eigenfunction(s) and eigenvalue(s) of the integral equation 


oO 
Ww 


1 
ule) = | e* Yuly) dy. 


g) Solve the integral equation: u(x) = e” + A i e?¥ u(y) dy. 
h) Solve the integral equation 


1 
u(e)=2+ f dy(1+ey) uly 
0 
for the unknown function u(x) 


Exercise 9.2: Solve the integral equation 


1 
u(x) = f(a) + | eyuly)dy, O<a2<1 
0 


for the unknown u(x) in terms of the given function f(x). For what values 
of X does a unique solution u(x) exist without restrictions on f(x)? For what 
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value \ = Xo does a solution exist only if f(x) satisfies some condition? Using 
the language of the Fredholm alternative, and the range and nullspace of the 
relevant operators, explain what is happening when A’ = Ao. For the case 
= Ao find explicitly the condition on f(x) and, assuming this condition is 
satisfied, write down the corresponding general solution for u(x). Check that 
this solution does indeed satisfy the integral equation. 


Exercise 9.3: Use a Laplace transform to find the solution to the generalized 
Abel equation 


Oe [e ae eee 


where f(z) is given and f(0) = 0. Your solution will be of the form 


‘ae / "K(a—t)f'(tdt, 


and you should give an explicit expression for the kernel K(x — ft). 
You will find the formula 
[o-e) 
[ le Pdi =p YT (p), p> 0: 
0 


to be useful. 


Exercise 9.4: Translationally invariant kernels: 
a) Consider the integral equation: u(x) = g(x) + A f° K(a,y) u(y) dy, 
with the translationally invariant kernel K(x, y) = Q(#— y), in which g, 
d and Q are known. Show that the Fourier transforms @, g and Q satisfy 
ti(q) = 9(q)/{1 — V27AQ(q)}. Expand this result to second order in A 
to recover the second-order Liouville-Neumann-Born series. 
b) Use Fourier transforms to find a solution of the integral equation 


u(x) = eT lel + rf eq le-yl u(y) dy 


that remains finite as |x| — oo. 
c) Use Laplace transforms to find a solution of the integral equation 


u(z) =e 7+ | el u(y)dy «> 0. 
0 
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Exercise 9.5: The integral equation 


Ey eee) ee 
-{ Way x > 0, 


relates the unknown function ¢ to the known function f. 


(i) Show that the changes of variables 
a =exp2é, y=exp2n, ¢(exp2n) exp =Y(n),  flexp 2) expé = g(€), 


converts the integral equation into one that can be solved by an integral 
transform. 

(ii) Hence, or otherwise, construct an explicit formula for ¢(x) in terms of a 
double integral involving f(y). 


You may use without proof the integral 
oo —1s& 
i Gee = 2 oe 
_o  cosh€  cosh7s/2 


Exercise 9.6: Using Mellin transforms. Recall that the Mellin transform f(s) 
of the function f(t) is defined by 


= [ dee aE) 
0 


a) Given two functions, f(t) and g(t), a Mellin convolution f * g can be 


defined through 
(rego = fo fw) 


Show that the Mellin transform of the Mellin convolution f * g is 
Fras) = [ey a\(that = Fleas. 
Similarly find the Mellin transform of 


Oe : ” F(tu)g(u) du 


b) The unknown function F(t) satisfies Fox’s integral equation, 


fe / ” dv Q(tv) F(v), 


in which G and Q are known. Solve for the Mellin transform F in terms 
of the Mellin transforms G and Q. 
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Exercise 9.7: Some more easy problems: 


a) Solve the Lalesco-Picard integral equation 
(oe) 
u(x) = cos pax + i dye ?-¥l u(y) . 
—Co 


b) For A 4 8, solve the integral equation 


1 
(a) =1+2 f dy xy $y). 


c) By taking derivatives, show that the solution of the Volterra equation 


v= [aye +e) HW) 


satisfies a first order differential equation. Hence, solve the integral equa- 
tion. 


Exercise 9.8: Principal part integrals. 


a) If w is real, show that 


ve 2 1 2 fv’ 2 
pf e ” du = —2./me~” [ e" du. 
—oo 0 


U— W 


(This is easier than it looks.) 
b) If y is real, but not in the interval (—1,1), show that 


‘ 1 1 
[ (y—2)V1— 2? oF Vy-1 


Now let y € (—1,1). Show that 


1 
1 
al ————_——_. dz = 0 
= | (y— xr)Vv1 cam x 
(This is harder than it looks.) 


Exercise 9.9: 


Consider the integral equation 


1 
u(t) = g(a) +2 / K(,y) uly) dy, 


in which only u is unknown. 
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a) Write down the solution u(x) to second order in the Liouville-Neumann- 
Born series. 

b) Suppose g(x) = x and K(z,y) = sin2zxy. Compute u(x) to second 
order in the Liouville-Neumann-Born series. 


Exercise 9.10:: Show that the application of the Fredholm series method to 
the equation 


1 
y(ar) = v+a | (xy + y*)p(y) dy 


gives 


2. 1 
Djs eS 
oy 3° — 7? 


and 


1 1 1 af 
D(z,y,r) = Alay +9") + M (Say — sey — sy" + Fy): 


Chapter 10 


Vectors and Tensors 


In this chapter we explain how a vector space V gives rise to a family of 
associated tensor spaces, and how mathematical objects such as linear maps 
or quadratic forms should be understood as being elements of these spaces. 
We then apply these ideas to physics. We make extensive use of notions and 
notations from the appendix on linear algebra, so it may help to review that 
material before we begin. 


10.1 Covariant and contravariant vectors 


When we have a vector space V over R, and {e1, e2,...,e,} and {e},e5,...,e,} 
are both bases for V, then we may expand each of the basis vectors e,, in 
terms of the e/, as 


e, = abe’, (10.1) 


We are here, as usual, using the Einstein summation convention that repeated 
indices are to be summed over. Written out in full for a three-dimensional 
space, the expansion would be 


las 23. Beh 

= 1a 20) 307 

C2 = ae; Ape A,€3, 
e3 = aye, + azey + aed. 


We could also have expanded the e’/, in terms of the e, as 


e = (a ')Hel. (10.2) 
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As the notation implies, the matrices of coefficients a4 and (a~')# are inverses 
of each other: 
ap (an")g = (a bag = 65. (10.3) 


If we know the components x’ of a vector x in the e,, basis then the compo- 
nents 2’ of x in the e/, basis are obtained from 


x= ge, = x’e, = (a”al) e, (10.4) 


by comparing the coefficients of e!,. We find that 2" = aa’. Observe how 
the e,, and the x" transform in “opposite” directions. The components x 
are therefore said to transform contravariantly. 

Associated with the vector space V is its dual space V*, whose elements 
are covectors, i.e. linear maps f: V — R. If f ¢ V* and x = x“e,, we use 
the linearity property to evaluate f(x) as 


f(x) =feey ate.) 2" 7: (10.5) 


Here, the set of numbers f,, = f(e,,) are the components of the covector f. If 
we change basis so that e, = ave, then 


f, = f(e,) = f(ate,,) = aff(e’,) = at fi, (10.6) 


We conclude that f, = a! f/,. The f,, components transform in the same man- 
ner as the basis. They are therefore said to transform covariantly. In physics 
it is traditional to call the the set of numbers x” with upstairs indices (the 
components of) a contravariant vector. Similarly, the set of numbers f,, with 
downstairs indices is called (the components of) a covariant vector. Thus, 
contravariant vectors are elements of V and covariant vectors are elements 
of V*. 

The relationship between V and V* is one of mutual duality, and to 
mathematicians it is only a matter of convenience which space is V and 
which space is V*. The evaluation of f € V* on x € V is therefore often 
written as a “pairing” (f,x), which gives equal status to the objects being 
put togther to get a number. A physics example of such a mutually dual pair 
is provided by the space of displacements x and the space of wave-numbers 
k. The units of x and k are different (meters versus meters~!). There is 
therefore no meaning to “x +k,” and x and k are not elements of the same 
vector space. The “dot” in expressions such as 


p(x) = e** (10.7) 
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cannot be a true inner product (which requires the objects it links to be in 
the same vector space) but is instead a pairing 


(k,x) = k(x) = kya". (10.8) 


In describing the physical world we usually give priority to the space in which 
we live, breathe and move, and so treat it as being “V”. The displacement 
vector x then becomes the contravariant vector, and the Fourier-space wave- 
number k, being the more abstract quantity, becomes the covariant covector. 

Our vector space may come equipped with a metric that is derived from 
a non-degenerate inner product. We regard the inner product as being a 
bilinear form g: V x V — R, so the length ||x|| of a vector x is \/g(x,x). 
The set of numbers 


Iuv = g(e,, e,) (10.9) 
comprises the (components of) the metric tensor. In terms of them, the 
inner of product (x,y) of pair of vectors x = xe, and y = y“e,, becomes 

(x,y) = e(% y) = gw "y”. (10.10) 


Real-valued inner products are always symmetric, so g(x,y) = g(y,x) and 
Juv = Guu AS the product is non-degenerate, the matrix g,,, has an inverse, 
which is traditionally written as g“”. Thus 


Iw = 9" Guu = 5. (10.11) 


The additional structure provided by the metric permits us to identify V 
with V*. The identification is possible, because, given any f € V*, we can 
find a vector f € V such that 


f(x) = (f,x). (10.12) 
We obtain f by solving the equation 
Gat (10.13) 


to get fv = g’" f,.. We may now drop the tilde and identify f with f, and 
hence V with V*. When we do this, we say that the covariant components 
f, are related to the contravariant components f" by raising 


fea g fy, (10.14) 
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or lowering 
an = Gils (10.15) 


the index jz using the metric tensor. Bear in mind that this V = V* identi- 
fication depends crucially on the metric. A different metric will, in general, 
identify an f € V* with a completely different f € V. 

We may play this game in the Euclidean space E” with its “dot” inner 
product. Given a vector x and a basis e,, for which g,,, = e, -e,, we can 
define two sets of components for the same vector. Firstly the coefficients x” 
appearing in the basis expansion 


ETE, (10.16) 
and secondly the “components” 
Lp = ey X = g(ey,X) = glen, 2’e,) = glen, er)2” = Que” (10.17) 


of x along the basis vectors. These two set of numbers are then respectively 
called the contravariant and covariant components of the vector x. If the 
e,, constitute an orthonormal basis, where g,, = 6,,, then the two sets of 
components (covariant and contravariant) are numerically coincident. In a 
non-orthogonal basis they will be different, and we must take care never to 
add contravariant components to covariant ones. 


10.2 Tensors 


We now introduce tensors in two ways: firstly as sets of numbers labelled by 
indices and equipped with transformation laws that tell us how these numbers 
change as we change basis; and secondly as basis-independent objects that 
are elements of a vector space constructed by taking multiple tensor products 
of the spaces V and V*. 


10.2.1 Transformation rules 


After we change basis e,, — e!,, where e, = a/e!,, the metric tensor will be 
represented by a new set of components 


rs = g(e!,,e),). (10.18) 
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These are be related to the old components by 


Gu = (Cy, &») = Blane, ape.) = azarg(e,,e,) =anarg,,, (10.19) 
This transformation rule for g,, has both of its subscripts behaving like the 
downstairs indices of a covector. We therefore say that g,, transforms as a 
doubly covariant tensor. Written out in full, for a two-dimensional space, 
the transformation law is 


Gu = 440591, + a1ajgiy + a{aign + 105950, 
$12 = 44091, + A159} + afaggy) + a{05950, 
Gar = Ax04 9} + A207 g\y + a3a;95, + 4307959, 
922 = 03059} a 030399 T 03059 oi 0305999. 


In three dimensions each row would have nine terms, and sixteen in four 
dimensions. 

A set of numbers Q° ye: Whose indices range from 1 to the dimension of 
the space and that transforms as 


Og SOG ae a dO Fe, (10.20) 


or conversely as 


/ / 13 


QF se = Gray (A *)2 (A)§ (a NCQ?” ore (10.21) 


comprises the components of a doubly contravariant, triply covariant tensor. 
More compactly, the Qe? 5. are the components of a tensor of type (2,3). 
Tensors of type (p,q) are defined analogously. The total number of indices 
p+q is called the rank of the tensor. 

Note how the indices are wired up in the transformation rules (10.20) 
and (10.21): free (not summed over) upstairs indices on the left hand side 
of the equations match to free upstairs indices on the right hand side, simi- 
larly for the downstairs indices. Also upstairs indices are summed only with 
downstairs ones. 

Similar conditions apply to equations relating tensors in any particular 
basis. If they are violated you do not have a valid tensor equation — meaning 
that an equation valid in one basis will not be valid in another basis. Thus 
an equation 


AM BM Ol (10.22) 


VAT 
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is fine, but 
2 v 
AY = BY +Ch + Dt, (10.23) 


vroo 
has something wrong in each term. 

Incidentally, although not illegal, it is a good idea not to write tensor 
indices directly underneath one another — 7.e. do not write OF — because 
if you raise or lower indices using the metric tensor, and some pages later in 
a calculation try to put them back where they were, they might end up in 
the wrong order. 


Tensor algebra 


The sum of two tensors of a given type is also a tensor of that type. The sum 
of two tensors of different types is not a tensor. Thus each particular type of 
tensor constitutes a distinct vector space, but one derived from the common 
underlying vector space whose change-of-basis formula is being utilized. 
Tensors can be combined by multiplication: if A”, and B”,,_ are tensors 
of type (1,2) and (1,3) respectively, then 
Ce 


vApoT 


= AP BE pee (10.24) 


is a tensor of type (2, 5). 
An important operation is contraction, which consists of setting one or 
more contravariant index index equal to a covariant index and summing over 

the repeated indices. This reduces the rank of the tensor. So, for example, 
Doe = (Ge 


aBpot 


(10.25) 


is a tensor of type (0,3). Similarly f(x) = f,x" is a type (0,0) tensor, i.e. an 
invariant — a number that takes the same value in all bases. Upper indices 
can only be contracted with lower indices, and vice versa. For example, the 
array of numbers Ay = Bagg obtained from the type (0,3) tensor Bag, is not 
a tensor of type (0,1). 

The contraction procedure outputs a tensor because setting an upper 
index and a lower index to a common value jz and summing over p, leads to 
the factor ... (a-*)Bae ... appearing in the transformation rule. Now 


Geo. (10.26) 


and the Kronecker delta effects a summation over the corresponding pair of 
indices in the transformed tensor. 
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Although often associated with general relativity, tensors occur in many 
places in physics. They are used, for example, in elasticity theory, where the 
word “tensor” in its modern meaning was introduced by Woldemar Voigt 
in 1898. Voigt, following Cauchy and Green, described the infinitesimal 
deformation of an elastic body by the strain tensor egg, which is a tensor 
of type (0,2). The forces to which the strain gives rise are described by the 
stress tensor o™. A generalization of Hooke’s law relates stress to strain via 
a tensor of elastic constants c%?” as 


ge Sr aig. (10.27) 


We study stress and strain in more detail later in this chapter. 


Exercise 10.1: Show that g”, the matrix inverse of the metric tensor g,,,, is 
indeed a doubly contravariant tensor, as the position of its indices suggests. 


10.2.2 Tensor character of linear maps and quadratic 
forms 

As an illustration of the tensor concept and of the need to distinguish be- 
tween upstairs and downstairs indices, we contrast the properties of matrices 
representing linear maps and those representing quadratic forms. 

A linear map M : V — V is an object that exists independently of any 
basis. Given a basis, however, it is represented by a matrix M", obtained 
by examining the action of the map on the basis elements: 


M(e,) =e, M”",. (10.28) 
Acting on x we get a new vector y = M(x), where 
yey =y = M(x) = M(2“e,) = 2" M(e,) = 2" M" e, = M" x" e,. (10.29) 


We therefore have 
y= Mx, (10.30) 


which is the usual matrix multiplication y = Mx. When we change basis, 
e, = ave’, then 


e,M”, = M(e,) = M(ahe’,) = a’ M(e’,) — ale, Ms, — eee 
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Comparing coefficients of e,, we find 


yp _ —l\v Oo 
MM”, = aba" )e Ms, (10.32) 
or, conversely, 
MY (as )hae IM (10.33) 


Thus a matrix representing a linear map has the tensor character suggested 
by the position of its indices, 7.e. it transforms as a type (1,1) tensor. We can 
derive the same formula in matrix notation. In the new basis the vectors x 
and y have new components x’ = Ax, and y’ = Ay. Consequently y = Mx 
becomes 


y’ = Ay = AMx = AMA!x’, (10.34) 
and the matrix representing the map M has new components 
M’ = AMA™. (10.35) 


Now consider the quadratic form Q : V — R that is obtained from a 
symmetric bilinear form Q : V x V — R by setting Q(x) = Q(x, x). We can 
write 


O% SO =a Ope =x Ox, (10.36) 
where Q,v = Q(e,, ev) are the entries in the symmetric matrix Q, the suffix T 
denotes transposition, and x’ Qx is standard matrix-multiplication notation. 
Just as does the metric tensor, the coefficients Q,,, transform as a type (0, 2) 
tensor: 


Qu = aia Od (10.37) 
In matrix notation the vector x again transforms to have new components 
x’ = Ax, but x” =x? A?. Consequently 


x Q’x’ = x’ A7Q’Ax. (10.38) 


Thus 
Q=A'Q’A. (10.39) 


The message is that linear maps and quadratic forms can both be represented 
by matrices, but these matrices correspond to distinct types of tensor and 
transform differently under a change of basis. 

A matrix representing a linear map has a basis-independent determinant. 
Similarly the trace of a matrix representing a linear map 


trM © MM", (10.40) 
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is a tensor of type (0,0), i.e. a scalar, and therefore basis independent. On 
the other hand, while you can certainly compute the determinant or the trace 
of the matrix representing a quadratic form in some particular basis, when 
you change basis and calculate the determinant or trace of the transformed 
matrix, you will get a different number. 

It 7s possible to make a quadratic form out of a linear map, but this 
requires using the metric to lower the contravariant index on the matrix 
representing the map: 


Q(x) = 2"g7Q"\x* =x- Qx. (10.41) 


Be careful, therefore: the matrices “Q” in x’ Qx and in x-Qx are representing 
different mathematical objects. 


Exercise 10.2: In this problem we will use the distinction between the trans- 
formation law of a quadratic form and that of a linear map to resolve the 
following “paradox”: 


e In quantum mechanics we are taught that the matrices representing two 
operators can be simultaneously diagonalized only if they commute. 
e In classical mechanics we are taught how, given the Lagrangian 


1, : 1 
L= Ss" (Sai = sia ; 
9) 
to construct normal co-ordinates Q; such that DL becomes 
1 . 1 
b= (50 - Sura?) 
i 


We have apparantly managed to simultaneously diagonize the matrices M;; — 
diag (1,...,1) and Vi; — diag (wi,...,w2), even though there is no reason for 
them to commute with each other! 


Show that when M and V are a pair of symmetric matrices, with M being 
positive definite, then there exists an invertible matrix A such that ATMA 
and A? VA are simultaneously diagonal. (Hint: Consider M as defining an 
inner product, and use the Gramm-Schmidt procedure to first find a orthonor- 
mal frame in which Mj; = 6;;. Then show that the matrix corresponding to 
V in this frame can be diagonalized by a further transformation that does not 
perturb the already diagonal M;;-) 
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10.2.3. Tensor product spaces 


We may regard the set of numbers Caan as being the components of an 
object Q that is element of the vector space of type (2,3) tensors. We 
denote this vector space by the symbol V ®@V @® V* @® V* @V*, the notation 
indicating that it is derived from the original V and its dual V* by taking 
tensor products of these spaces. The tensor Q is to be thought of as existing 
as an element of V ®V ®V* @V* @V* independently of any basis, but given 
a basis {e,,} for V, and the dual basis {e*”} for V*, we expand it as 


Q = Q" 5c€a Beg BE Be” Qe™. (10.42) 
Here the tensor product symbol “®” is distributive 


a®(b+c) = a®b+a@c, 


(a+b)®@®c = a@®c+b@ce, (10.43) 
and associative 
(a®b)®c=a@(b@o), (10.44) 
but is not commutative 
a®bA¢Ab&a. (10.45) 


Everything commutes with the field, however, 
A(a® b) = (Aa) ® b = a® (Ab). (10.46) 
If we change basis eg = abels then these rules lead, for example, to 
Cg @ €g = anage) @e',. (10.47) 
From this change-of-basis formula, we deduce that 
Te, ® eg = Taras el @e), = T™ &, ce (10.48) 


where 
Dt = Pera. (10.49) 


The analogous formula for eg ® eg ® e*7 ® e* @ e* reproduces the transfor- 
mation rule for the components of Q. 

The meaning of the tensor product of a collection of vector spaces should 
now be clear: If e,, consititute a basis for V, the space V @ V is, for example, 
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the space of all linear combinations’ of the abstract symbols e,, ® e,, which 
we declare by fiat to constitute a basis for this space. There is no geometric 
significance (as there is with a vector product a x b) to the tensor product 
a® b, so the e,, ® e, are simply useful place-keepers. Remember that these 
are ordered pairs, e,, ® e, # e, ® ey. 

Although there is no geometric meaning, it is possible, however, to give 
an algebraic meaning to a product like e** @ e*/ @ e*” by viewing it as a 
multilinear form V x V x V :—> R. We define 


e @e* @e™ (e,,€,€,) = 6g On. (10.50) 


We may also regard it as a linear map V ® V ®@ V :— R by defining 
e @ e* @e* (eg @eg @e,) =H 03 0, (10.51) 


and extending the definition to general elements of V ® V @ V by linearity. 
In this way we establish an isomorphism 


V*@V*@V* = (VEVEV). (10.52) 


This multiple personality is typical of tensor spaces. We have already seen 
that the metric tensor is simultaneously an element of V* ® V* and a map 
g:V—-V*. 


Tensor products and quantum mechanics 


When we have two quantum-mechanical systems having Hilbert spaces H™ 
and H), the Hilbert space for the combined system is H™ @H®). Quantum 
mechanics books usually denote the vectors in these spaces by the Dirac “bra- 
ket” notation in which the basis vectors of the separate spaces are denoted 
by? |n1) and |n2), and that of the combined space by |n1, nz). In this notation, 
a state in the combined system is a linear combination 


|W) = So |ni,n2) (ni, nal), (10.53) 


n1,N2 


'Do not confuse the tensor-product space V ® W with the Cartesian product V x W. 
The latter is the set of all ordered pairs (x,y), x € V, y € W. The tensor product includes 
also formal sums of such pairs. The Cartesian product of two vector spaces can be given 
the structure of a vector space by defining an addition operation \(x1,y1) + u(x2, v2) = 
(Ax1 + ux, Avi + “y2), but this construction does not lead to the tensor product. Instead 
it defines the direct sum V @ W. 

?We assume for notational convenience that the Hilbert spaces are finite dimensional. 
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This is the tensor product in disguise. To unmask it, we simply make the 
notational translation 


|v) > W 
(m1, M2|V) > yr 
Imi) > ef!) 
In2) > ei 
Ini,n2) > e) @e®. (10.54) 
Then (10.53) becomes 
w= yr ee) @e®). (10.55) 
Entanglement: Suppose that 1 has basis e{?,...,e4) and H®) has basis 
e...,e2). The Hilbert space H @H®) is then nm dimensional. Consider 
a state 


U = pe @ a EHO @H®), (10.56) 


If we can find vectors 


eS = ge E HY, 
KX = xe eH, (10.57) 
such that 
Y= SEX = diye? ae” (10.58) 


then the tensor W is said to be decomposable and the two quantum systems 
are said to be unentangled. If there are no such vectors then the two systems 
are entangled in the sense of the Einstein-Podolski-Rosen (EPR) paradox. 

Quantum states are really in one-to-one correspondence with rays in the 
Hilbert space, rather than vectors. If we denote the n dimensional vector 
space over the field of the complex numbers as C” , the space of rays, in which 
we do not distinguish between the vectors x and Ax when A # 0, is denoted 
by CP”! and is called complex projective space. Complex projective space is 
where algebraic geometry is studied. The set of decomposable states may be 
thought of as a subset of the complex projective space CP™”~!, and, since, 
as the following excercise shows, this subset is defined by a finite number of 
homogeneous polynomial equations, it forms what algebraic geometers call a 
variety. This particular subset is known as the Segre variety. 
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Exercise 10.3: The Segre conditions for a state to be decomposable: 


i) By counting the number of independent components that are at our dis- 
posal in W, and comparing that number with the number of free param- 
eters in ® @ X, show that the coefficients wy must satisfy (n —1)(m—1) 
relations if the state is to be decomposable. 

ii) If the state is decomposable, show that 


ij yi 


0= rs yr 


for all sets of indices i, j, ky, l. 

iii) Assume that 7!! is not zero. Using your count from part (i) as a guide, 
find a subset of the relations from part (ii) that constitute a necessary and 
sufficient set of conditions for the state VY to be decomposable. Include 
a proof that your set is indeed sufficient. 


10.2.4 Symmetric and skew-symmetric tensors 


By examining the transformation rule you may see that if a pair of up- 
stairs or downstairs indices is symmetric (say QM ,,, = Q’"4g,) or skew- 
symmetric (QY” ,, = —Q’",,,) in one basis, it remains so after the basis 
has been changed. (This is not true of a pair composed of one upstairs 
and one downstairs index.) It makes sense, therefore, to define symmetric 
and skew-symmetric tensor product spaces. Thus skew-symmetric doubly- 
contravariant tensors can be regarded as belonging to the space denoted by 


A’ V and expanded as 
1 

A= aan eA Gy; (10.59) 
where the coefficients are skew-symmetric, A“” = —A’“, and the wedge prod- 
uct of the basis elements is associative and distributive, as is the tensor 
product, but in addition obeys e, Ae, = —e, \e,. The “1/2” (replaced 
by 1/p! when there are p indices) is convenient in that each independent 
component only appears once in the sum. For example, in three dimensions, 


1 
le e, Ae, = Ae, Aen + A en Ae3 + A® e3 A e1. (10.60) 


Symmetric doubly-contravariant tensors can be regarded as belonging to 
the space sym?V and expanded as 


S= Se, Oeg (10.61) 
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where €, © eg = eg © e, and S°? = $%*, (We do not insert a “1/2” here 
because including it leads to no particular simplification in any consequent 
equations. ) 

We can treat these symmetric and skew-symmetric products as symmetric 
or skew multilinear forms. Define, for example, 


e* Ae”? (e,,e,) = 5065p — Jo5n, (10.62) 
and 
e** Ae? (e, Ae,) = Joop — Jo OR. (10.63) 


We need two terms on the right-hand-side of these examples because the 
skew-symmetry of e** A e*9( , ) in its slots does not allow us the luxury of 
demanding that the e,, be inserted in the exact order of the e** to get a non- 
zero answer. Because the p-th order analogue of (10.62) form has p! terms 
on its right-hand side, some authors like to divide the right-hand-side by p! 
in this definition. We prefer the one above, though. With our definition, and 
with A = $A,,e*" \e*” and B = 3 Be, A eg, we have 


1 
A(B) = 54wBY = Awe”, (10.64) 


U<v 


so the sum is only over independent terms. 

The wedge (A) product notation is standard in mathematics wherever 
skew-symmetry is implied.* The “sym” and © are not. Different authors use 
different notations for spaces of symmetric tensors. This reflects the fact that 
skew-symmetric tensors are extremely useful and appear in many different 
parts of mathematics, while symmetric ones have fewer special properties 
(although they are common in physics). Compare the relative usefulness of 
determinants and permanents. 


Exercise 10.4: Show that in d dimensions: 


i) the dimension of the space of skew-symmetric covariant tensors with p 
indices is d!/p!(d — p)!; 

ii) the dimension of the space of symmetric covariant tensors with p indices 
is (d+ p—1)!/p\(d —1)!. 


3Skew products and abstract vector spaces were introduced simultaneously in Hermann 
Grassmann’s Ausdehnungslehre (1844). Grassmann’s mathematics was not appreciated in 
his lifetime. In his disappointment he turned to other fields, making significant con- 
tributions to the theory of colour mixtures (Grassmann’s law), and to the philology of 
Indo-European languages (another Grassmann’s law). 
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Bosons and fermions 


Spaces of symmetric and skew-symmetric tensors appear whenever we deal 
with the quantum mechanics of many indistinguishable particles possessing 
Bose or Fermi statistics. If we have a Hilbert space 1 of single-particle states 
with basis e; then the N-boson space is Sym“ H which consists of states 


Pp = DieNIN e. © ei, ‘OxEEEO) Cin (10.65) 
and the N-fermion space is ANH, which contains states 


V= rl ej, A Cj, Ars: A Cin: (10.66) 


The symmetry of the Bose wavefunction 
Ditlavigvin — Qirvigeianin (10.67) 
and the skew-symmetry of the Fermion wavefunction 
Yirrdavigein — qyievigedarin (10.68) 


under the interchange of the particle labels a, 3 is then natural. 
Slater Determinants and the Plticker Relations: Some N-fermion states can 
be decomposed into a product of single-particle states 


bi Ay A+ Atby 
: hao yes, Neg No ei (10.69) 


WU 


Comparing the coefficients of e;, \e;, \---Ae;, in (10.66) and (10.69) shows 
that the many-body wavefunction can then be written as 


ay ig in 


i pit... yi 
zh 12 eee uN 
; 2 2 2 
i le (10.70) 
a1 t2 eee in 
N N N 


The wavefunction is therefore given by a single Slater determinant. Such 
wavefunctions correspond to a very special class of states. The general 
many-fermion state is not decomposable, and its wavefunction can only be 
expressed as a sum of many Slater determinants. The Hartree-Fock method 
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of quantum chemistry is a variational approximation that takes such a single 
Slater determinant as its trial wavefunction and varies only the one-particle 
wavefunctions (|v) = w’. It is a remarkably successful approximation, 
given the very restricted class of wavefunctions it explores. 

As with the Segre condition for two distinguishable quantum systems to 
be unentangled, there is a set of necessary and sufficient conditions on the 
W--ty for the state W to be decomposable into single-particle states. The 
conditions are that 

Yiri2-tn—1 1 yp d2I3--I +1] —0 (10.71) 


for any choice of indices 71,...iy—, and ji,...,jn41. The square brackets 
[...] indicate that the expression is to be antisymmetrized over the indices 
enclosed in the brackets. For example, a three-particle state is decomposable 
if and only if 


Yt 2291 fp I2I3I4 _ Ys 212292 Pp I1I3I4 + Yr 212293 YpI1I294 _ Ys 12294 Pp J1I2I3 — 0. (10.72) 


These conditions are called the Plucker relations after Julius Plucker who 
discovered them long before before the advent of quantum mechanics.* It is 
easy to show that Pliicker’s relations are necessary conditions for decompos- 
ability. It takes more sophistication to show that they are sufficient. We will 
therefore defer this task to the exercises as the end of the chapter. As far as 
we are aware, the Plticker relations are not exploited by quantum chemists, 
but, in disguise as the Hirota bilinear equations, they constitute the geometric 
condition underpinning the many-soliton solutions of the Korteweg-de-Vries 
and other soliton equations. 


10.2.5 Kronecker and Levi-Civita tensors 


Suppose the tensor 6” is defined, with respect to some basis, to be unity if 
ju = v and zero otherwise. In a new basis it will transform to 


oe = al\(a')e oe = ana") =O) (10.73) 


In other words the Kronecker delta symbol of type (1, 1) has the same numer- 
ical components in all co-ordinate systems. This is not true of the Kronecker 
delta symbol of type (0, 2), te. of dyn. 


4 As well as his extensive work in algebraic geometry, Pliicker (1801-68) made important 
discoveries in experimental physics. He was, for example, the first person to observe the 
deflection of cathode rays — beams of electrons — by a magnetic field, and the first to 
point out that each element had its characteristic emission spectrum. 
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Now consider an n-dimensional space with a tensor Nyy... Whose com- 
ponents, in some basis, coincides with the Levi-Civita symbol €),,....,- We 
find that in a new frame the components are 


A aie = ae (ae i maa 
= Eq pe...fun (ae (Cake eas Nan aloe eer 
Cina Cet Al 


= Winn cdetAr* (10.74) 


Thus, unlike the 64, the Levi-Civita symbol is not quite a tensor. 
Consider also the quantity 


def 


VG = 


Here we assume that the metric is positive-definite, so that the square root 
is real, and that we have taken the positive square root. Since 


det [gy]. (10.75) 


det [g/,,,] = det [(a~')8(a")2 gp] = (det A) det [gu], (10.76) 


be 


we see that 
J/g! = \det A|- Vg (10.77) 


Thus ,/g is also not quite an invariant. This is only to be expected, because 
g( , ) is a quadratic form and we know that there is no basis-independent 
meaning to the determinant of such an object. 

Now define 


Epi po.bn = ay @ Swati (10.78) 


and assume that €y,,..u, has the type (0,n) tensor character implied by 
its indices. When we look at how this transforms, and restrict ourselves 
to orientation preserving changes of of bases, i.e. ones for which det A is 
positive, we see that factors of det A conspire to give 


Cis = YG Choise cei (10.79) 


A similar exercise indictes that if we define «“1#2"" to be numerically equal 


to Eis i2...Ln then 


CHAM 2b i cHIH2-Hn (10.80) 


V9 
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also transforms as a tensor — in this case a type (n,0) contravariant one 
— provided that the factor of 1/,/g is always calculated with respect to the 
current basis. 

If the dimension n is even and we are given a skew-symmetric tensor F’,,, 
we can therefore construct an invariant 


ee 2. Fe iiatin P 


H1f2..-Un 
E F, Mn-1bhn V9 Hib 


Hi p2 * 


ae (10.81) 
Similarly, given an skew-symmetric covariant tensor F,,,__,,,, with m (<n) 
indices we can form its dual, denoted by F*, a (n — m)-contravariant tensor 
with components 


(F*)Hmti tn = J pape tin P = FO apetin 


= Mim = era Suceas 41082) 


We meet this “dual” tensor again, when we study differential forms. 


10.3 Cartesian tensors 


If we restrict ourselves to Cartesian co-ordinate systems having orthonormal 
basis vectors, so that g,; = 0;;, then there are considerable simplifications. 
In particular, we do not have to make a distinction between co- and contra- 
variant indices. We shall usually write their indices as roman-alphabet suf- 
fixes. 
A change of basis from one orthogonal n-dimensional basis e; to another 
e’, will set 
e; = Ope; (10.83) 


where the numbers O;; are the entries in an orthogonal matrix O, i.e. a real 
matrix obeying O7'O = OO7 =I, where T denotes the transpose. The set 
of n-by-n orthogonal matrices constitutes the orthogonal group O(n). 


10.3.1 Isotropic tensors 


The Kronecker 6;; with both indices downstairs is unchanged by O(n) trans- 
formations, 


85; = OixO jdm = OixOjx = OinOK; = Sig, (10.84) 
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and has the same components in any Cartesian frame. We say that its 
components are numerically invariant. A similar property holds for tensors 
made up of products of 6;;, such as 


Lelie = dij OkOmn: (10.85) 


It is possible to show? that any tensor whose components are numerically 
invariant under all orthogonal transformations is a sum of products of this 
form. The most general O(n) invariant tensor of rank four is, for example. 


bigDet + BOind1y + YOidje- (10.86) 


The determinant of an orthogonal transformation must be +1. If we only 
allow orientation-preserving changes of basis then we restrict ourselves to 
orthogonal transformations O;; with detO = 1. These are the proper or- 
thogonal transformations. In n dimensions they constitute the group SO(n). 
Under SO(n) transformations, both 6;; and €;,;,..;,, are numerically invariant 
and the most general SO(n) invariant tensors consist of sums of products of 
6;;’s and €;,;...;,,’8. The most general SO(4)-invariant rank-four tensor is, for 
example, 

A055 0K1 + BOiKO1; + VOuU0jk + A€izKI- (10.87) 


Tensors that are numerically invariant under SO(n) are known as isotropic 
tensors. 

As there is no longer any distinction between co- and contravariant in- 
dices, we can now contract any pair of indices. In three dimensions, for 
example, 

Bigs = Cnig ene (10.88) 


is a rank-four isotropic tensor. Now €¢;,..;, is not invariant when we transform 
via an orthogonal transformation with det O = —1, but the product of two 
e’s is invariant under such transformations. The tensor 6;;,; is therefore 
numerically invariant under the larger group O(3) and must be expressible 
as 

Bijrt = A0ij0k1 + FOind1g + OiU0jK (10.89) 


for some coefficients a, @ and y. The following exercise explores some con- 
sequences of this and related facts. 


°The proof is surprisingly complicated. See, for example, M. Spivak, A Comprehensive 
Introduction to Differential Geometry (second edition) Vol. V, pp. 466-481. 
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Exercise 10.5: We defined the n-dimensional Levi-Civita symbol by requiring 
that €;,i5...;,, be antisymmetric in all pairs of indices, and €42.., = 1. 


a) Show that €123 = €931 = €312, but that €1234 = —€2341 = €3412 = —€4123. 
b) Show that 
€ijkEi jk! = i405 7/On! + five other terms, 


where you should write out all six terms explicitly. 

c) Show that €ijk€igh! = 05 5/ Okk! — ik! Okj!= 

d) For dimension n = 4, write out €jjx€;;h/17 AS a Sum of products of 6’s 
similar to the one in part (c). 


Exercise 10.6: Vector Products. The vector product of two three-vectors may 
be written in Cartesian components as (a x b); = €jna;bz. Use this and your 
results about €;;, from the previous exercise to show that 

i) a-(bxc)=b-(c x a)=c-(axb), 

ii) ax (bx c) = (a-c)b — (a- b)e, 

iii) (a x b)-(c x d) = (a-c)(b- d) — (a- d)(b-c). 

iv) If we take a, b, c and d, with d = b, to be unit vectors, show that 
the identities (i) and (iii) become the sine and cosine rule, respectively, 
of spherical trigonometry. (Hint: for the spherical sine rule, begin by 
showing that a- [(a x b) x (a x c)] =a: (bx c).) 


10.3.2 Stress and strain 


As an illustration of the utility of Cartesian tensors, we consider their appli- 
cation to elasticity. 

Suppose that an elastic body is slightly deformed so that the particle that 
was originally at the point with Cartesian co-ordinates x; is moved to x;+1;. 
We define the (infinitesimal) strain tensor e;; by 


P! Onj Oni 
Sp = (4 + st) : (10.90) 


It is automatically symmetric: e;; = e;;. We will leave for later (exercise 
11.3) a discussion of why this is the natural definition of strain, and also 
the modifications necessary were we to employ a non-Cartesian co-ordinate 
system. 

To define the stress tensor o;; we consider the portion 2 of the body in 
figure 10.1, and an element of area dS = nd|S| on its boundary. Here, n is 
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the unit normal vector pointing out of Q. The force F exerted on this surface 
element by the parts of the body exterior to 2 has components 


Figure 10.1: Stress forces. 


That F is a linear function of nd|S| can be seen by considering the forces 
on an small tetrahedron, three of whose sides coincide with the co-ordinate 
planes, the fourth side having n as its normal. In the limit that the lengths 
of the sides go to zero as €, the mass of the body scales to zero as €?, but 
the forces are proprtional to the areas of the sides and go to zero only as e€?. 
Only if the linear relation holds true can the acceleration of the tetrahedron 
remain finite. A similar argument applied to torques and the moment of 
inertia of a small cube shows that oj; = 0};. 
A generalization of Hooke’s law, 


Cig = Cizkl€kl, (10.92) 


relates the stress to the strain via the tensor of elastic constants cj;,. This 
rank-four tensor has the symmetry properties 


Cijkl = Cklig = Cyikl = Cijtk- (10.93) 


In other words, the tensor is symmetric under the interchange of the first 
and second pairs of indices, and also under the interchange of the individual 
indices in either pair. 

For an isotropic material — a material whose properties are invariant 
under the rotation group SO(3) — the tensor of elastic constants must be an 
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isotropic tensor. The most general such tensor with the required symmetries 
is 
Cigkt = AOizOna + M(Dikd jt + 4:19 jx): (10.94) 


As isotropic material is therefore characterized by only two independent pa- 
rameters, A and pu. These are called the Lamé constants after the mathemat- 
ical engineer Gabriel Lamé. In terms of them the generalized Hooke’s law 
becomes 


Cig = On ehE Tes (10.95) 


By considering particular deformations, we can express the more directly 
measurable bulk modulus, shear modulus, Young’s modulus and Poisson’s 
ratio in terms of X and yu. 

The bulk modulus « is defined by 


dV 
dP = —K— 10.96 
a (10.96) 


where an infinitesimal isotropic external pressure dP causes a change V — 
V + dV in the volume of the material. This applied pressure corresponds to 


a surface stress of o;; = —d,; dP. An isotropic expansion displaces points in 
the material so that 


1dV 
m= 3570 (10.97) 
The strains are therefore given by 
1. dV 


Inserting this strain into the stress-strain relation gives 


2 .dV 
Oij = Oi(A + ZH) T> = —OiyP. (10.99) 
a. Ve 
Thus 
2 
K= A+ oH. (10.100) 
To define the shear modulus, we assume a deformation 7; = 0x2, so 


€12 = €21 = 6/2, with all other e;; vanishing. 
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Ao, 


« or) 


Figure 10.2: Shear strain. The arrows show the direction of the applied 
stresses. The g2, on the vertical faces are necessary to stop the body rotating. 


The applied shear stress is 012 = 02;. The shear modulus, is defined to be 
02/0. Inserting the strain components into the stress-strain relation gives 


O12 = pO, (10.101) 


and so the shear modulus is equal to the Lamé constant ju. We can therefore 
write the generalized Hooke’s law as 


Cig = 2pw(esg — ZOizeKn) + KERRd a, (10.102) 


which reveals that the shear modulus is associated with the traceless part of 
the strain tensor, and the bulk modulus with the trace. 

Young’s modulus Y is measured by stretching a wire of initial length L 
and square cross section of side W under a tension T = 033W?. 


Figure 10.3: Forces on a stretched wire. 


We define Y so that A 


At the same time as the wire stretches, its width changes W — W 4+ dW. 
Poisson’s ratio o is defined by 


(10.104) 
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so that o is positive if the wire gets thinner as it gets longer. The displace- 


ments are 
dL 
730 = 2 ais ; 


= B)-( 2). 
im = Y () =-oy (=) (10.105) 


so the strain components are 


a ld 10.106 
Eek SE Ea ea: (10.106) 
We therefore have 
dL 
033 = (A(1 — 20) + 2p) (=) ‘ (10.107) 
leading to 
Y = X(1 — 20) + 2p. (10.108) 


Now, the side of the wire is a free surface with no forces acting on it, so 


D 
0 = 09 = oy = (A(1 — 20) — 2op) (=) (10.109) 
This tells us that® 
see (10.110) 
eee 
= 3A 42 
Tv 4H 
y= 5 10.111 
»(Se) (10.111) 


Other relations, following from those above, are 


Y = 3x(1—2o), 
2u(1 +o). (10.112) 


°Poisson and Cauchy erroneously believed that \ = jz, and hence that o = 1/4. 
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Figure 10.4: Bent beam. 


Exercise 10.7: Show that the symmetries 


Cijkl = Cklij = Cjikl = Cijlk 
imply that a general homogeneous material has 21 independent elastic con- 


stants. (This result was originally obtained by George Green, of Green func- 
tion fame.) 


Exercise 10.8: A steel beam is forged so that its cross section has the shape 
of a region I € R?. When undeformed, it lies along the z axis. The centroid 
O of each cross section is defined so that 


[eavdy = [ yardy =o. 
r r 


when the co-ordinates x, y are taken with the centroid O as the origin. The 
beam is slightly bent away from the z axis so that the line of centroids remains 
in the y,z plane. (See figure 10.4) At a particular cross section with centroid 
O, the line of centroids has radius of curvature R. 


Assume that the deformation in the vicinity of O is such that 


Oo 


Nea = Rey 
1 

hy = Sp ee J 
1 

Qe = Dy. 


R 


Observe that for this assumed deformation, and for a positive Poisson ratio, 
the cross section deforms anticlastically — the sides bend up as the beam 
bends down. This is shown in figure 10.5. 


Compute the strain tensor resulting from the assumed deformation, and show 
that its only non-zero components are 


y. 


oO oO 1 
Cra = Ry Cyy = RY Czz = R 
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Figure 10.5: The original (dashed) and anticlastically deformed (full) cross- 
section. 


Next, show that 


eo ¥ 
ZZ R Y; 


and that all other components of the stress tensor vanish. Deduce from this 
vanishing that the assumed deformation satisfies the free-surface boundary 
condition, and so is indeed the way the beam responds when it is bent by 
forces applied at its ends. 


The work done in bending the beam 


1 
3 
/ 5 Cis Cig hl Ed d° x 
beam 


is stored as elastic energy. Show that for our bent rod this energy is equal to 


vi fa on PV cis 


where s is the arc-length taken along the line of centroids of the beam, 


r= [ Paviy 
r 


is the moment of inertia of the region I about the x axis, and y” denotes 
the second derivative of the deflection of the beam with respect to z (which 
approximates the arc-length). This last formula for the strain energy has been 
used in a number of our calculus-of-variations problems. 
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Figure 10.6: The distribution of forces o0,, exerted on the left-hand part of 
the bent rod by the material to its right. 


10.3.3 Maxwell stress tensor 


Consider a small cubical element of an elastic body. If the stress tensor were 
position independent, the external forces on each pair of opposing faces of 
the cube would be equal in magnitude but pointing in opposite directions. 
There would therefore be no net external force on the cube. When o;; is not 
constant then we claim that the total force acting on an infinitesimal element 
of volume dV is 

F, = 0,0%; dV. (10.113) 
To see that this assertion is correct, consider a finite region 22 with boundary 
OQ, and use the divergence theorem to write the total force on Q as 


Fist — / emAsl= i, BjoyjdV. (10.114) 
an Q 


Whenever the force-per-unit-volume f; acting on a body can be written 
in the form f; = 0j;0;;, we refer to o;; as a “stress tensor,” by analogy with 
stress in an elastic solid. As an example, let E and B be electric and magnetic 
fields. For simplicity, initially assume them to be static. The force per unit 
volume exerted by these fields on a distribution of charge p and current j is 


f =pE+jxB. (10.115) 


From Gauss’ law p = div D, and with D = e9E, we find that the force per 
unit volume due the electric field has components 


pi, =(0;D)E; = (;(B:E}) — &; O;E;) 


€o(0j(E:E;) — Ey 0.B;) 


1 
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Here, in passing from the first line to the second, we have used the fact that 
curl E is zero for static fields, and so 0; E; = 0;E;. Similarly, using j = curl H, 
together with B = oH and div B = 0, we find that the force per unit volume 
due the magnetic field has components 


1 
The quantity 
1 9 1 , 


is called the Mazwell stress tensor. Its utility lies in in the fact that the 
total electromagnetic force on an isolated body is the integral of the Maxwell 
stress over its surface. We do not need to know the fields within the body. 

Michael Faraday was the first to intuit a picture of electromagnetic stresses 
and attributed both a longitudinal tension and a mutual lateral repulsion to 
the field lines. Maxwell’s tensor expresses this idea mathematically. 


Exercise 10.9: Allow the fields in the preceding calculation to be time depen- 
dent. Show that Maxwell’s equations 


curl E = See. divB = 0, 
Ot 


OD 
curlH =j+ Oe divD = p, 


with B = yoH, D = eoE, and c = 1/,/f796, lead to 


: ofl 
(pE +Jjx B); + OL {ze x #1), — O;0%;- 


The left-hand side is the time rate of change of the mechanical (first term) 
and electromagnetic (second term) momentum density. Observe that we can 
equivalently write 


6) 

ot 
and think of this a local field-momentum conservation law. In this interpre- 
tation —o;; is thought of as the momentum flux tensor, its entries being the 
flux in direction j of the component of field momentum in direction 7. The 
term on the right-hand side is the rate at which momentum is being supplied 
to the electro-magnetic field by the charges and currents. 


{5(e x 1)} + 9j(-0y) = -(0B+) xB), 
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10.4 Further exercises and problems 


Exercise 10.10: Quotient theorem. Suppose that you have come up with some 
recipe for generating an array of numbers T“* in any co-ordinate frame, and 
want to know whether these numbers are the components of a triply con- 
travariant tensor. Suppose further that you know that, given the components 
aj; of an arbitrary doubly covariant tensor, the numbers 


Ta, =v 


transform as the components of a contravariant vector. Show that T* does 
indeed transform as a triply contravariant tensor. (The natural generalization 
of this result to arbitrary tensor types is known as the quotient theorem.) 


Exercise 10.11: Let T be the 3-by-3 array of components of a tensor. Show 
that the quantities 


a= van b= TEs. C= SAP ed Bua 


are invariant. Further show that the eigenvalues of the linear map represented 
by the matrix TJ”; can be found by solving the cubic equation 


Pee eee (0? b)A (as 3ab + 2c) = 0. 


Exercise 10.12: Let the covariant tensor R;;~, possess the following symme- 
tries: 


1) La = ha 
ii) Rijet = —Rijir, 
iii) Rijn + Rikiy + Raye = 0. 
Use the properties i),ii), iii) to show that: 


a) Rijkt = Reuj- 

b) If Rijziy cry! = 0 for all vectors x’, y’, then Rijri = 9. 

c) If B,; is asymmetric covariant tensor and set we Ajjx: = Bix Byji— BuBjr, 
then Ajj,; has the same symmetries as Rjjx1. 


Exercise 10.13: Write out Euler’s equation for fluid motion 
v+(v-V)v=-VhA 


in Cartesian tensor notation. Transform it into 


1 
vovxw=-v (Sv 4h), 
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where w = V x v is the vorticity. Deduce Bernoulli’s theorem, that for steady 
(v = 0) flow the quantity sv" +h is constant along streamlines. 


Exercise 10.14: The elastic properties of an infinite homogeneous and isotropic 
solid of density p are described by Lamé constants A and p. Show that the 
equation of motion for small-amplitude vibrations is 


Pop 


Here 7; are the cartesian components of the displacement vector (x,t) of the 
particle initially at the point x. Seek plane wave solutions of the form 


nN = aexp{ik - x — iwt}, 


and deduce that there are two possible types of wave: longitudinal “P-waves,” 


which have phase velocity 
A+2 
Gag eae 
p 


and transverse “S-waves,” which have phase velocity 


w= [6 
p 


Exercise 10.15: Symmetric integration. Show that the n-dimensional integral 


dk 
Inara = f oa (akpkske) £(0?), 
is equal to 
A(dag575 + bay5e5 + ba55By) 
where 


— tf kya pipe 
a n(n + 2) i (amr (* VFR). 


Similarly evaluate 


—_ d"k; 2 
Lopyée = | Tanyn Rakaky keke) f(k uF 
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Exercise 10.16: Write down the most general three-dimensional isotropic ten- 
sors of rank two and three. 


In piezoelectric materials, the application of an electric field E; induces a 
mechanical strain that is described by a rank-two symmetric tensor 
ej = dijpEk, 


where dj;, is a third-rank tensor that depends only on the material. Show 
that e;; can only be non-zero in an anisotropic material. 


Exercise 10.17: In three dimensions, a rank-five isotropic tensor Tj;x4im is a 
linear combination of expressions of the form €;,j,;,0i,i, for some assignment 
of the indices 2,7,k,1,m to the 71,...,i5. Show that, on taking into account 
the symmetries of the Kronecker and Levi-Civita symbols, we can construct 
ten distinct products €j,i5i30i,i,- Only siz of these are linearly independent, 
however. Show, for example, that 


€ijkOlm — EjklOim + EKO jm — €ijg9km = 9, 


and find the three other independent relations of this sort.” 


(Hint: Begin by showing that, in three dimensions, 


Orie. Oinas Oia: Oras 
izigigiag def Oinis Oivie Diniz Oivis 6 
‘aizisia CS = 0, 
i Oisis Sigig Sigir igi 

Onis. Opies Die: Ones 


and contract with €;,i7i,-) 


Problem 10.18: The Pliicker Relations. This problem provides a challenging 
test of your understanding of linear algebra. It leads you through the task of 
deriving the necessary and sufficient conditions for 


A= Ate, A... Ae, € ARV 


to be decomposable as 
A=f Afonr...A fp. 


The trick is to introduce two subspaces of V, 


“Such relations are called syzygies. A recipe for constructing linearly independent basis 
sets of isotropic tensors can be found in: G. F. Smith, Tensor, 19 (1968) 79-88. 
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i) W, the smallest subspace of V such that A € AE W, 
ii) WW ={veEV:vAA=)}, 


and explore their relationship. 


a) Show that if {w1, w2,...,wn} constitute a basis for W’, then 
A=wyiAwedA::-AwnAp 


for some y € 7 ale V. Conclude that that W’ C W, and that equal- 
ity holds if and only if A is decomposable, in which case W = W! = 
span{f,...f,}. 

b) Now show that W is the image space of hes V* under the map that 
takes 

Ae ue Nexo \ bly 


d f 
fees ef — ea 


Deduce that the condition W C W’ is that 


c) By taking 


show that the condition in part b) can be written as 
Ablte-151 APB-Iktles A... A Cj,4, = 0. 


Deduce that the necessary and sufficient conditions for decomposibility 


are that 
Ait--tk-1lh1 Ai253--Se+1] = Q, 


for all possible index sets 71,..., 74-1, J1,---Jk+1- Here [...] denotes anti- 
symmetrization of the enclosed indices. 


Chapter 11 


Differential Calculus on 
Manifolds 


In this section we will apply what we have learned about vectors and ten- 
sors in linear algebra to vector and tensor fields in a general curvilinear 
co-ordinate system. Our aim is to introduce the reader to the modern lan- 
guage of advanced calculus, and in particular to the calculus of differential 
forms on surfaces and manifolds. 


11.1 Vector and covector fields 


Vector fields — electric, magnetic, velocity fields, and so on — appear every- 
where in physics. After perhaps struggling with it in introductory courses, we 
rather take the field concept for granted. There remain subtleties, however. 
Consider an electric field. It makes sense to add two field vectors at a single 
point, but there is no physical meaning to the sum of field vectors E(x,) and 
E(x2) at two distinct points. We should therefore regard all possible electric 
fields at a single point as living in a vector space, but each different point in 
space comes with its own field-vector space. 

This view seems even more reasonable when we consider velocity vectors 
describing motion on a curved surface. A velocity vector lives in the tangent 
space to the surface at each point, and each of these spaces is a differently 
oriented subspace of the higher-dimensional ambient space. 


419 
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Figure 11.1: Each point on a surface has its own vector space of tangents. 


Mathematicians call such a collection of vector spaces — one for each of the 
points in a surface — a vector bundle over the surface. Thus, the tangent 
bundle over a surface is the totality of all vector spaces tangent to the surface. 
Why a bundle? This word is used because the individual tangent spaces are 
not completely independent, but are tied together in a rather non-obvious 
way. Try to construct a smooth field of unit vectors tangent to the surface 
of a sphere. However hard you work you will end up in trouble somewhere. 
You cannot comb a hairy ball. On the surface of torus you will have no such 
problems. You can comb a hairy doughnut. The tangent spaces collectively 
know something about the surface they are tangent to. 

Although we spoke in the previous paragraph of vectors tangent to a 
curved surface, it is useful to generalize this idea to vectors lying in the 
tangent space of an n-dimensional manifold, or n-manifold. A n-manifold M 
is essentially a space that locally looks like a part of R”. This means that 
some open neighbourhood of each point « € M can be parametrized by an 
n-dimensional co-ordinate system. A co-ordinate parametrization is called a 
chart. Unless M is R” itself (or part of it), a chart will cover only part of 
M, and more than one will be required for complete coverage. Where a pair 
of charts overlap, we demand that the transformation formula giving one set 
of co-ordinates as a function of the other be a smooth (C™) function, and to 
possess a smooth inverse.! A collection of such smoothly related co-ordinate 
charts covering all of M is called an atlas. The advantage of thinking in 
terms of manifolds is that we do not have to understand their properties 
as arising from some embedding in a higher dimensional space. Whatever 
structure they have, they possess in, and of, themselves 


1A formal definition of a manifold contains some further technical restrictions (that the 
space be Hausdorff and paracompact) that are designed to eliminate pathologies. We are 
more interested in doing calculus than in proving theorems, and so we will ignore these 
niceties. 
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Classical mechanics provides a familiar illustration of these ideas. Except 
in pathological cases, the configuration space M of a mechanical system is 
a manifold. When the system has n degrees of freedom we use generalized 
co-ordinates g’, i = 1,...,n to parametrize M. The tangent bundle of M 
then provides the setting for Lagrangian mechanics. This bundle, denoted 
by TM, is the 2n-dimensional space each of whose whose points consists of a 
point gq = (q',...,q") in M paired with a tangent vector lying in the tangent 
space I’M, at that point. If we think of the tangent vector as a velocity, the 
natural co-ordinates on TM become (q',q?,.-.,9°; 7, 7,---,q¢"), and these 
are the variables that appear in the Lagrangian of the system. 

If we consider a vector tangent to some curved surface, it will stick out 
of it. If we have a vector tangent to a manifold, it is a straight arrow lying 
atop bent co-ordinates. Should we restrict the length of the vector so that 
it does not stick out too far? Are we restricted to only infinitesimal vectors? 
It is best to avoid all this by adopting a clever notion of what a vector in 
a tangent space is. The idea is to focus on a well-defined object such as 
a derivative. Suppose that our space has co-ordinates 2”. (These are not 
the contravariant components of some vector) A directional derivative is an 
object such as X“0,,, where O,, is shorthand for 0/0z". When the components 
X*" are functions of the co-ordinates x”, this object is called a tangent-vector 
field, and we write? 

KX), (11.1) 


We regard the O,, at a point x as a basis for 77M, the tangent-vector space at 
x, and the X“(z) as the (contravariant) components of the vector X at that 
point. Although they are not little arrows, what the 0,, are is mathematically 
clear, and so we know perfectly well how to deal with them. 

When we change co-ordinate system from x" to z” by regarding the x“’s 
as invertable functions of the z”’s, 7.e. 


eee — aa Ld ge ae a (11.2) 

?We are going to stop using bold symbols to distinguish between intrinsic objects and 
their components, because from now on almost everything will be something other than a 
number, and too much black ink would just be confusing. 
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then the chain rule for partial differentiation gives 


a, dO dz” O -($ a (11.3) 


act — Ox Az” \ Age } 


where 0! is shorthand for 0/0z’. By demanding that 


Xa XO, = I0"0) (11.4) 


we find the components in the z” co-ordinate frame to be 


Rvs (=) XH, (11.5) 


Ort! 


Conversely, using 


Ox? Oz” —— Ox? 
oS Sn Se NO 11. 
Oz” Ox# = Ox! On) Hho) 
we have 5 
V__ oS IL 
xX” = (=) xX. (LL.7) 


This, then, is the transformation law for a contravariant vector. 

It is worth pointing out that the basis vectors O,, are not unit vectors. As 
we have no metric, and therefore no notion of length anyway, we cannot try 
to normalize them. If you insist on drawing (small?) arrows, think of 0; as 
starting at a point (z',x?,...,2”) and with its head at (v1 + 1,27,...,2”). 
Of course this is only a good picture if the co-ordinates are not too “curvy.” 


Figure 11.2: Approximate picture of the vectors 0, and Oj at the point 
(xt, 2*) = (2,4). 


Example: The surface of the unit sphere is a manifold. It is usually denoted 
by S?. We may label its points with spherical polar co-ordinates, 6 mea- 
suring the co-latitude and @ measuring the longitude. These will be useful 
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everywhere except at the north and south poles, where they become singular 
because at 6 = 0 or 7 all values of of the longitude @ correspond to the same 
point. In this co-ordinate basis, the tangent vector representing the velocity 
field due to a rigid rotation of one radian per second about the z axis is 


Vee (118) 
Similarly 

V, = —sin@ 0d, — cot 6cos ¢04, 

a cos @ Og — cot # sin 04, (11.9) 


respectively represent rigid rotations about the x and y axes. 

We now know how to think about vectors. What about their dual-space 
partners, the covectors? These live in the cotangent bundle T*M, and for 
them a cute notational game, due to Elie Cartan, is played. We write the 
basis vectors dual to the 0,, as dz“(_). Thus 


dx!(O,) = 6". (11.10) 


When evaluated on a vector field X = X90, the basis covectors dx” return 
its components: 


dx"(X) = dxl(X¥A,) = X”dxl"(0,) = X’5H = X#, (11.11) 


Now, any smooth function f € C%(M) will give rise to a field of covectors 
in T*M. This is because a vector field _X acts on the scalar function f as 


Xf = X*0,f (11.12) 


and X f is another scalar function. This new function gives a number — and 

thus an element of the field IR — at each point x € M. But this is exactly 

what a covector does: it takes in a vector at a point and returns a number. 

We will call this covector field “df.” It is essentially the gradient of f. Thus 
def Of 


df(X) © Xf = Xr. (11.13) 


If we take f to be the co-ordinate x”, we have 


Ox” 


dg A). = XY 


a (11.14) 
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so this viewpoint is consistent with our previous definition of dx”. Thus 


df (X) = ot xe = oF ae"(X) (11.15) 


for any vector field X. In other words, we can expand df as 


of 
df = 5, de". (11.16) 
This is not some approximation to a change in f, but is an exact expansion 
of the covector field df in terms of the basis covectors dx". 

We may retain something of the notion that dz“ represents the (con- 
travariant) components of a small displacement in x provided that we think 
of dx as a machine into which we insert the small displacement (a vector) 
and have it spit out the numerical components 62". This is the same dis- 
tinction that we make between sin(_) as a function into which one can plug 
x, and sin, the number that results from inserting in this particular value 
of x. Although seemingly innocent, we know that it is a distinction of great 
power. 

The change of co-ordinates transformation law for a covector field f,, is 
found from 


fee Sf de (11.17) 

by using 
dx! = (S) dz”. (11.18) 

We find 
y= (5) ie (11.19) 


d 
A general tensor such as Q”’ por transforms as 


( Oz* Oz" Ax? Ox Ox 
Qa i= = — Oe. (11.20) 


Observe how the indices are wired up: Those for the new tensor coefficients 
in the new co-ordinates, z, are attached to the new z’s, and those for the old 
coefficients are attached to the old x’s. Upstairs indices go in the numerator 
of each partial derivative, and downstairs ones are in the denominator. 
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The language of bundles and sections 


At the beginning of this section, we introduced the notion of a vector bundle. 
This is a particular example of the more general concept of a fibre bundle, 
where the vector space at each point in the manifold is replaced by a “fibre” 
over that point. The fibre can be any mathematical object, such as a set, 
tensor space, or another manifold. Mathematicians visualize the bundle as 
a collection of fibres growing out of the manifold, much as stalks of wheat 
grow out the soil. When one slices through a patch of wheat with a scythe, 
the blade exposes a cross-section of the stalks. By analogy, a choice of an 
element of the the fibre over each point in the manifold is called a cross- 
section, or, more commonly, a section of the bundle. In this language, a 
tangent-vector field becomes a section of the tangent bundle, and a field of 
covectors becomes a section of the cotangent bundle. 
We provide a more detailed account of bundles in Chapter 16. 


11.2 Differentiating tensors 


If f is a function then O,, f are components of the covariant vector df. Suppose 
that a“ is a contravariant vector. Are 0,a" the components of a type (1, 1) 
tensor? The answer is no! In general, differentiating the components of a 
tensor does not give rise to another tensor. One can see why at two levels: 

a) Consider the transformation laws. They contain expressions of the form 
Ox" /Oz”. If we differentiate both sides of the transformation law of a 
tensor, these factors are also differentiated, but tensor transformation 
laws never contain second derivatives, such as 07x" /02z"02°. 

b) Differentiation requires subtracting vectors or tensors at different points 
— but vectors at different points are in different vector spaces, so their 
difference is not defined. 

These two reasons are really one and the same. We need to be cleverer to 
get new tensors by differentiating old ones. 


11.2.1 Lie bracket 


One way to proceed is to note that the vector field _X is an operator. It makes 
sense, therefore, to try to compose two of them to make another. Look at 
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XY, for example: 


KY SOO) SOX & ) Ov. (11.21) 


Ox! 


What are we to make of this? Not much! There is no particular interpretation 
for the second derivative, and as we saw above, it does not transform nicely. 
But suppose we take a commutator: 


[X,Y] = XY —YX = (X#(0,Y”) — Y"(8,X”)) d. (11.22) 


The second derivatives have cancelled, and what remains is a directional 
derivative and so a bona-fide vector field. The components 


[X, Y]’ = X#(8,Y”) — Y#(0,X”) (11.23) 


are the components of a new contravariant vector field made from the two 
old vector fields. This new vector field is called the Lie bracket of the two 
fields, and has a geometric interpretation. 


To understand the geometry of the Lie bracket, we first define the flow 
associated with a tangent-vector field X. This is the map that takes a point 
Xo and maps it to x(t) by solving the family of equations 


— = X*(z',x”,...), (11.24) 


with initial condition 7“(0) = x4. In words, we regard X as the velocity field 
of a flowing fluid, and let x ride along with the fluid. 


Now envisage X and Y as two velocity fields. Suppose we flow along X 
for a brief time t, then along Y for another brief interval s. Next we switch 
back to X, but with a minus sign, for time t, and then to —Y for a final 
interval of s. We have tried to retrace our path, but a short exercise with 
Taylor’s theorem shows that we will fail to return to our exact starting point. 
We will miss by 62" = st|X,Y]", plus corrections of cubic order in s and t. 
(See figure 11.3) 
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sY 


—sY 
st[X,Y] 


Figure 11.3: We try to retrace our steps but fail to return by a distance 
proportional to the Lie bracket. 


Example: Let 


V, = —sin dO, — cot @cos 0g, 
V, =  cos@0, — cot Asin d dz, 


be two vector fields in T(S?). We find that 
[V2 Vy] = =V, 


where V, = 0g. 


Frobenius’ theorem 


Suppose that in some region of a d-dimensional manifold M we are given 
n <d linearly independent tangent-vector fields X;. Such a set is called a 
distribution by differential geometers. (The concept has nothing to do with 
probability, or with objects like “(a)” which are also called “distributions.” ) 
At each point x, the span (X;(x)) of the field vectors forms a subspace of 
the tangent space T7M,, and we can picture this subspace as a fragment of 
an n-dimensional surface passing through x. It is possible that these surface 
fragments fit together to make a stack of smooth surfaces — called a folation 
— that fill out the d-dimensional space, and have the given X; as their tangent 
vectors. 
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Figure 11.4: A local foliation. 


If this is the case then starting from x and taking steps only along the X; 
we find ourselves restricted to the n-surface, or n-submanifold, N passing 
though the original point 2. 

Alternatively, the surface fragments may form such an incoherent jumble 
that starting from x and moving only along the X; we can find our way to any 
point in the neighbourhood of x. It is also possible that some intermediate 
case applies, so that moving along the X; restricts us to an m-surface, where 
d>m > n. The Lie bracket provides us with the appropriate tool with 
which to investigate these possibilities. 

First a definition: If there are functions a (@) such that 


[Xe Xa] Soy" Xe, (11.25) 


i.e. the Lie brackets close within the set {X;} at each point x, then the 
distribution is said to be involutive. and the vector fields are said to be “in 
involution” with each other. When our given distribution is involutive, then 
the first case holds, and, at least locally, there is a foliation by n-submanifolds 
N. A formal statement of this is: 

Theorem (Frobenius): A smooth (C®) involutive distribution is completely 
integrable: locally, there are co-ordinates x", j = 1,...,d such that X; = 
ee X#0,,, and the surfaces N through each point are in the form x" = 
const. for p = n+1,...,d. Conversely, if such co-ordinates exist then the 
distribution is involutive. 

A half-proof: If such co-ordinates exist then it is obvious that the Lie bracket 
of any pair of vectors in the form X; = ae X#0,, can also be expanded in 
terms of the first n basis vectors. A logically equivalent statement exploits the 
geometric interpretation of the Lie bracket: If the Lie brackets of the fields 
X; do not close within the n-dimensional span of the X;, then a sequence 
of back-and-forth mancevres along the X; allows us to escape into a new 
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direction, and so the X; cannot be tangent to an n-surface. Establishing the 
converse — that closure implies the existence of the foliation — is rather 
more technical, and we will not attempt it. 

Involutive and non-involutive distributions appear in classical mechanics 
under the guise of holonomic and anholonomic constraints. In mechanics, 
constraints are not usually given as a list of the directions (vector fields) in 
which we are free to move, but instead as a list of restrictions imposed on the 
permitted motion. In a d-dimensional mechanical system we might have set 
of m independent constraints of the form wi (qq 0 Ck, oS Sued 
restrictions are most naturally expressed in terms of the covector fields 


w= owilgdg, i=1<ism. (11.26) 


We can write the constraints as the m conditions w’(q) = 0 that must be 
satisfied if q = gO, is to be an allowed motion. The list of constraints is 
known a Pfaffian system of equations. These equations indirectly determine 
an n = d—m dimensional distribution of permitted motions. The Pfaffian 
system is said to be integrable if this distribution is involutive, and hence 
integrable. In this case there is a set of m functions g'(q) and an invertible 
m-by-m matrix f*,;(q) such that 


w' = oe f',(q)dg’. (11.27) 
j=l 


The functions g'(q) can, for example, be taken to be the co-ordinate functions 
x’, w=n+1,...,d, that label the foliating surfaces N in the statement of 
Frobenius’ theorem. The system of integrable constraints w'(qg) = 0 thus 
restricts us to the surfaces g'(q) = constant. 

For example, consider a particle moving in three dimensions. If we are 
told that the velocity vector is constrained by w(q) = 0, where 


w=xdzxr+ydy+zdz (11.28) 


we realize that the particle is being forced to move on a sphere passing 
through the initial point. In spherical co-ordinates the associated distribution 
is the set {09,05}, which is clearly involutive because [09,03] = 0. The 
functions f(x,y, z) and g(x,y, z) from the previous paragraph can be taken 
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to be r = ,/x? + y? + z?, and the constraint covector written as w = f dg = 
r dr. 

The foliation is the family of nested spheres whose centre is the origin. 
(The foliation is not global because it becomes singular at r = 0.) Constraints 
like this, which restrict the motion to a surface, are said to be holonomic. 

Suppose, on the other hand, we have a ball rolling on a table. Here, we 
have a five-dimensional configuration manifold I = R? x S?, parametrized 
by the centre of mass (x,y) € R? of the ball and the three Euler angles 
(0,0, w) € S® defining its orientation. Three no-slip rolling conditions 


a wsin @ sind + Ocos¢, 
= —y) sin 6 cos ¢ + Asin ¢, 
0 = zpcosd+4¢, (11.29) 


(see exercise 11.17) link the rate of change of the Euler angles to the velocity 
of the centre of mass. At each point in this five-dimensional manifold we 
are free to roll the ball in two directions, and so we might expect that the 
reachable configurations constitute a two-dimensional surface embedded in 
the full five-dimensional space. The two vector fields 


roll, = 0, —sin@cot 6 0g + cos d Op + cosec Asin ¢ Oy, 
Oy + cos ¢ cot 6 Og + sin d Og — cosec A cos@ Oy, (11.30) 


rolly 


describing the permitted x- and y-direction rolling motion are not in invo- 
lution, however. By calculating enough Lie brackets we eventually obtain 
five linearly independent velocity vector fields, and starting from one con- 
figuration we can reach any other. The no-slip rolling condition is said to 
be non-integrable, or anholonomic. Such systems are tricky to deal with in 
Lagrangian dynamics. 

The following exercise provides a familiar example of the utility of non- 
holonomic constraints: 


Exercise 11.1: Parallel Parking using Lie Brackets. The configuration space 
of a car is four dimensional, and parameterized by co-ordinates (x, y,0,@), as 
shown in figure 11.5. Define the following vector fields: 


a) (front wheel) drive = cos ¢(cos @ 0, + sin @ O,) + sin ¢ Og. 
b) steer = 0g. 
c) (front wheel) skid = — sin ¢(cos 6 0; + sin 6 Oy) + cos ¢ Og. 
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drive 


Figure 11.5: Co-ordinates for car parking 


d) park = —sin@0, + cos 6 dy. 
Explain why these are apt names for the vector fields, and compute the six 


Lie brackets: 


[steer, drive], [steer,skid], [skid, drivel, 


(park, drive], [{park, park], [park, skid]. 


The driver can use only the operations (+) drive and (+) steer to mancevre 
the car. Use the geometric interpretation of the Lie bracket to explain how a 
suitable sequence of motions (forward, reverse, and turning the steering wheel) 
can be used to manoeuvre a car sideways into a parking space. 


11.2.2 Lie derivative 


Another derivative that we can define is the Lie derivative along a vector 
field X. It is defined by its action on a scalar function f as 


def 


Lxf = Xf, (11.31) 


on a vector field by 
def 


Lev = ey), (11.32) 


and on anything else by requiring it to be a derivation, meaning that it obeys 
Leibniz’ rule. For example, let us compute the Lie derivative of a covector 
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F. We first introduce an arbitrary vector field Y and plug it into F to get 
the scalar function F'(Y). Leibniz’ rule is then the statement that 

Ly F(Y) = (LXF)(Y) + F(LXY). (11.33) 


Since F'(Y) is a function and Y is a vector, both of whose derivatives we 
know how to compute, we know the first and third of the three terms in this 
equation. From Ly F(Y) = XF(Y) and F(LxY) = F([X, Y]), we have 


XF(Y) = (LxF)(Y) + F([X,Y]), (11.34) 


and so 
(Ly F)(Y) =XF(Y) — F([X,Y)). (11.35) 


In components, this becomes 


(LxF)(Y) 


KO EVE Oe Oy V0 x”) 
S(O, PROX OY". (11.36) 


Note how all the derivatives of Y“ have cancelled, so £xF( ) depends only 
on the local value of Y. The Lie derivative of F' is therefore still a covector 
field. This is true in general: the Lie derivative does not change the tensor 
character of the objects on which it acts. Dropping the passive spectator 
field Y”, we have a formula for Ly F' in components: 


(Leh) =] Oot G0 x” (11.37) 


Another example is provided by the Lie derivative of a type (0, 2) tensor, 
such as a metric tensor. This is 


(£9) = XO Os Gia ot Guay X “ + GovOuse (11.38) 


The Lie derivative of a metric measures the extent to which the displacement 
x* > 2* +€X°(x) deforms the geometry. If we write the metric as 


Gb = Gale de’ eda, (11.39) 


we can understand both this geometric interpretation and the origin of the 
three terms appearing in the Lie derivative. We simply make the displace- 
ment x* — x* + eX in the coefficients g,,,(x) and in the two dx®. In the 


latter we write " 


d(x* + eX) = dx® + Mts 


a 11.4 
528 dx (11.40) 
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Then we see that 


Galera Oda -— > lala) eX Ogg ie Guay” + O70, X")| da" Oda 
[Gav + (LX9) ww] dx” @ dx”. (11.41) 


A displacement field X that does not change distances between points, 7.e. one 
that gives rise to an zsometry, must therefore satisfy £yg = 0. Such an X is 
said to be a Killing field after Wilhelm Killing who introduced them in his 
study of non-euclidean geometries. 

The geometric interpretation of the Lie derivative of a vector field is as 
follows: In order to compute the X directional derivative of a vector field Y, 
we need to be able to subtract the vector Y(x) from the vector Y(x + eX), 
divide by €, and take the limit « — 0. To do this we have somehow to get the 
vector Y(x) from the point x, where it normally resides, to the new point 
x + eX, so both vectors are elements of the same vector space. The Lie 
derivative achieves this by carrying the old vector to the new point along the 
field X. 


eL\Y 


Figure 11.6: Computing the Lie derivative of a vector. 


Imagine the vector Y as drawn in ink in a flowing fluid whose velocity field 
is X. Initially the tail of Y is at x and its head is at x + Y. After flowing 
for a time e, its tail is at x + ¢€X — i.e exactly where the tail of Y(x~ + eX) 
lies. Where the head of transported vector ends up depends how the flow has 
stretched and rotated the ink, but it is this distorted vector that is subtracted 
from Y(x + €X) to get eLyY = €[X,Y]. 


Exercise 11.2: The metric on the unit sphere equipped with polar co-ordinates 
is 
g( , )=d0@d0+sin? 6 dd @ dé. 
Consider 
V, = — sin ¢ 0g — cot 8 cos ¢ Og, 
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which is the vector field of a rigid rotation about the x axis. Compute the Lie 
derivative Ly,g, and show that it is zero. 


Exercise 11.3: Suppose we have an unstrained block of material in real space. 
A co-ordinate system €!, £7, €°, is attached to the material of the body. The 
point with co-ordinate € is located at (x'(€), a?(€), x3 (€)) where «', x”, x are 


the usual R? Cartesian co-ordinates. 
a) Show that the induced metric in the € co-ordinate system is 
3 


Ox® Ox" 
Iw €) = a, DEF OLY 


b) The body is now deformed by an infinitesimal strain vector field (&). 
The atom with co-ordinate €“ is moved to what was €4 + n(&), or equiv- 
alently, the atom initially at Cartesian co-ordinate x*(€) is moved to 
x®* + ntOx* /OEH. Show that the new induced metric is 


Quy + O9uy = Guy + LyQuv- 


c) Define the strain tensor to be 1/2 of the Lie derivative of the metric 
with respect to the deformation. If the original € co-ordinate system 
coincided with the Cartesian one, show that this definition reduces to 


the familiar form 
ie 1 Ona +4 Onv 
ab 2 \ One ' Axe)” 


all tensors being Cartesian. 

d) Part c) gave us the geometric definitition of infinitesimal strain. If the 
body is deformed substantially, the Cauchy-Green finite strain tensor is 
defined as 


Ew (€) = 5 (Gv = gi) ) 


where go is the metric in the undeformed body and g,,, the metric in 


the deformed body. Explain why this is a reasonable definition. 


11.3 Exterior calculus 


11.3.1 Differential forms 


The objects we introduced in section 11.1, the dx“, are called one-forms, or 
differential one-forms. They are fields living in the cotangent bundle T*M 
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of M. More precisely, they are sections of the cotangent bundle. Sections 
of the bundle whose fibre above x € M is the p-th skew-symmetric tensor 
power /\’(T*M,,) of the cotangent space are known as p-forms. 

For example, 


A= A,dz" = Ajdx' + Adz? + Agdz’, (11.42) 
is a 1-form, 
F= 5 Fv" A dx” = Fyodax! A dx? + Fogdx? A dx? + Fadz? A dz’, (11.43) 
is a 2-form, and 


1 
= 3 tae dx" A dx” A dx? = O43 dx" A dx? A dx?, (11.44) 


is a 3-form. All the coefficients are skew-symmetric tensors, so, for example, 


1) we Dae eww She — =e = SU oie (11.45) 


In each example we have explicitly written out all the independent terms for 
the case of three dimensions. Note how the p! disappears when we do this 
and keep only distinct components. In d dimensions the space of p-forms is 
d!/p!(d — p)! dimensional, and all p-forms with p > d vanish identically. 

As with the wedge products in chapter one, we regard a p-form as a p- 
linear skew-symetric function with p slots into which we can drop vectors to 
get a number. For example the basis two-forms give 


da A da” (Aq, 03) = 5H5% — SHOX. (11.46) 


The analogous expression for a p-form would have p! terms. We can define 
an algebra of differential forms by “wedging” them together in the obvious 
way, so that the product of a p-form with a q-form is a (p+ q)-form. The 
wedge product is associative and distributive but not, of course, commuta- 
tive. Instead, if a is a p-form and b a q-form, then 


aN\b=(-1)"bAa. (11.47) 


Actually it is customary in this game to suppress the “A” and simply write 
Se $F, yw ax"dx”, it being assumed that you know that dx4dx” = —dx’dx" 
— what else could it be? 
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11.3.2 The exterior derivative 


These p-forms may seem rather complicated, so it is perhaps surprising that 
all the vector calculus (div, grad, curl, the divergence theorem and Stokes’ 
theorem, etc.) that you have learned in the past reduce, in terms of them, 
to two simple formulae! Indeed Elie Cartan’s calculus of p-forms is slowly 
supplanting traditional vector calculus, much as Willard Gibbs’ and Oliver 
Heaviside’s vector calculus supplanted the tedious component-by-component 
formule you find in Maxwell’s Treatise on Electricity and Magnetism. 

The basic tool is the exterior derivative “d”, which we now define ax- 
iomatically: 

i) If f is a function (0-form), then df coincides with the previous defini- 

tion, i.e. df(X) = Xf for any vector field X. 
ii) dis an anti-derivation: If a is a p-form and b a q-form then 


d(a A b) = daA b+ (—1)?a A db. (11.48) 


iii) Poincaré’s lemma: d? = 0, meaning that d(da) = 0 for any p-form a. 
iv) dis linear. That d(aa) = ada, for constant a follows already from i) 
and ii), so the new fact is that d(a + b) = da + db. 
It is not immediately obvious that axioms i), ii) and iii) are compatible 
with one another. If we use axiom i), ii) and d(dx’) = 0 to compute the d of 


grees 


1 
a = = (dM,..g,) da" da 
i 
= On Qi, _ : da* dx" --- dx’. (11.49) 
p! 
Now compute 
1 Lyk pai i 
d(dQ) = a (AQ, a ip) da'dx*dx" --- dx". (11.50) 
p! 


Fortunately this is zero because 0,0, = 0,02, while dx'dx* = —dx*dz'. 
As another example let A = A,dx' + Agdx? + A3dx?, then 


= OA» OA, 17.2 OA, OA3 37.1 OA3 OA, 24,3 
dA. -= (Se FB) acta 4 as Bek dx°dx” + Ayl Bak du“ dx 


I 
| 
4 
XN 
a 
8 
Red 
iw 
8 


(11.51) 
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where 


Fy = 0,Ay — Ay. (11.52) 


You will recognize the components of curl A hiding in here. 
Again, if F = Fyodx'dx? + Fo3dx7dx? + F3,dx3dx' then 


ur = ($2 OF3, | OF ig 


art Bart aa ) da'dx?da’. (11.53) 
This looks like a divergence. 

The axiom d? = 0 encompasses both “curl grad = 0” and “divcurl = 
0”, together with an infinite number of higher-dimensional analogues. The 
familiar “curl =V x”, meanwhile, is only defined in three dimensional space. 

The exterior derivative takes p-forms to (p+1)-forms i.e. skew-symmetric 
type (0,p) tensors to skew-symmetric (0,p + 1) tensors. How does “d” get 
around the fact that the derivative of a tensor is not a tensor? Well, if 
you apply the transformation law for A,,, and the chain rule to a to find 
the transformation law for F,, = O,Ay, — O,A,, you will see why: all the 
derivatives of the ge" cancel, and F’,,, is a bona-fide tensor of type (0,2). This 
sort of cancellation is why skew-symmetric objects are useful, and symmetric 
ones less so. 


Exercise 11.4: Use axiom ii) to compute d(d(a/Ab)) and confirm that it is zero. 


Closed and exact forms 


The Poincaré lemma, d? = 0, leads to some important terminology: 

i) A p-form w is said to be closed if dw = 0. 

ii) A p-form w is said to exact if w = dn for some (p — 1)-form 7. 
An exact form is necessarily closed, but a closed form is not necessarily exact. 
The question of when closed = exact is one involving the global topology of 
the space in which the forms are defined, and will be subject of chapter 13. 


Cartan’s formulze 


It is sometimes useful to have expressions for the action of d coupled with 
the evaluation of the subsequent (p + 1) forms. 

If f,7,w, are 0,1, 2-forms, respectively, then df,dn,dw, are 1,2, 3-forms. 
When we plug in the appropriate number of vector fields X,Y, Z, then, after 
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some labour, we will find 


df(X) = Xf. (11.54) 
dn(X,¥) = Xn(¥)—¥n(X)—n(LXY). (11.55) 

dw(X,Y,Z) = Xw(Y,Z)+Yu(Z,X)+ Zw(X,Y) 
—w([X, Y], Z) — w([Y, Z], X) — w([Z, X], Y). (11.56) 


These formula, and their higher-p analogues, express d in terms of geometric 
objects, and so make it clear that the exterior derivative is itself a geometric 
object, independent of any particular co-ordinate choice. 

Let us demonstate the correctness of the second formula. With 7 = n,dzx", 
the left-hand side, dn(X,Y), is equal to 


Onjpdarda (X,Y) 0 ley SAY), (11.57) 
The right hand side is equal to 
RE Op YY On pd ) Tol Og SY FOX (11.58) 


On using the product rule for the derivatives in the first two terms, we find 
that all derivatives of the components of X and Y cancel, and are left with 
exactly those terms appearing on left. 


Exercise 11.5: Let w’, i=1,...,r, be a linearly independent set of one-forms 
defining a Pfaffian system (see sec. 11.2.1) in d dimensions. 


i) Use Cartan’s formule to show that the corresponding (d—r)-dimensional 
distribution is involutive if and only if there is an r-by-r matrix of 1-forms 
6°; such that 


Tr 
dw’ = eee Aw, 
j=l 


ii) Show that the conditions in part i) are satisfied if there are r functions 
g’ and an invertible r-by-r matrix of functions f"; such that 


r 
w= S> fijdg’. 
j=l 


In this case foliation surfaces are given by the conditions g(x) = const., 
$= Agee gh 
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It is also possible, but considerably harder, to show that i) = ii). Doing so 
would constitute a proof of Frobenius’ theorem. 


Exercise 11.6: Let w be a closed two-form, and let Null(w) be the space of 
vector fields X such that w(X, ) = 0. Use the Cartan formule to show that 
if X,Y € Null(w), then [X,Y] € Null(w). 


Lie derivative of forms 


Given a p-form w and a vector field X, we can form a (p — 1)-form called 
ixw by writing 


pslots 
(pa) eres a) ©. Gere he (11.59) 
p—Islots p—Islots 


Acting on a 0-form, 7x is defined to be 0. This procedure is called the interior 
multiplication by X. It is simply a contraction 


Wis jo.5p iin GoX”; (11.60) 


but it is convenient to have a special symbol for this operation. It is perhaps 
surprising that 7x turns out to be an anti-derivation, just as is d. If 7 and w 
are p and q forms respectively, then 


ix(n Aw) = (ixn) Aw + (-1)?7n A (ixw), (11.61) 
even though 7x involves no differentiation. For example, if X = X“0,,, then 


ix (dx A dz”) 


de di (XO One 
Xda” — dx*X”, 
(ixdx") A (dx”) — dx A (ixdz”). (11.62) 


One reason for introducing ix is that there is a nice (and profound) 
formula for the Lie derivative of a p-form in terms of ix. The formula is 
called the infinitesimal homotopy relation. It reads 


Lyw = (dix + ixd)w. (11.63) 


This formula is proved by verifying that it is true for functions and one- 
forms, and then showing that it is a derivation — in other words that it 


440 CHAPTER 11. DIFFERENTIAL CALCULUS ON MANIFOLDS 


satisfies Leibniz’ rule. From the derivation property of the Lie derivative, we 
immediately deduce that that the formula works for any p-form. 
That the formula is true for functions should be obvious: Since 7x f = 0 


by definition, we have 
(dix tid) f jis jd hK)= Af Hj]Leri. (11.64) 


To show that the formula works for one forms, we evaluate 


(d 1x + ixd) Gp dx”) 


d(fLX") + ix (Of, dx¥dz”) 
= 0,(f,X")de" + 8, f,(X"dx” — X”dx") 
(X’Of, + fO.X”) da". (11.65) 


In going from the second to the third line, we have interchanged the dummy 
labels pp < v in the term containing dx”. We recognize that the 1-form in 
the last line is indeed Lyf. 

To show that diy +7xd is a derivation we must apply dix +ixdtoaAb 
and use the anti-derivation property of 7, and d. This is straightforward once 
we recall that d takes a p-form to a (p+ 1)-form while ix takes a p-form to 
a (p — 1)-form. 


Exercise 11.7: Let 
1 
w= — Og de de. 
p! 


Use the anti-derivation property of 7x to show that 


ixw = Wanton On? dr”, 


1 
gi) 


and so verify the equivalence of (11.59) and (11.60). 


Exercise 11.8: Use the infinitesimal homotopy relation to show that £ and d 
commute, i.e. for w a p-form, we have 


d(Lxw) = Lx (dw). 
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11.4. Physical applications 


11.4.1 Maxwell’s equations 


In relativistic? four-dimensional tensor notation the two source-free Maxwell’s 
equations 


curlE = 
divB = Q, (11.66) 


reduce to the single equation 


OF OF, OF), 
Ox Oxt! Ox” 


= 0. (11.67) 
where 


0 
3 eg 0 By By 
Pigs E, —B, 0 ee : (11.68) 
E, By. =Bz 0 
The “F” is traditional, for Michael Faraday. In form language, the relativistic 
equation becomes the even more compact expression dF’ = 0, where 


1 
a ghuwdada® 
B,dydz + B,dzdx + B,dady + E,dxdt + E,dydt + E,dzdt, 
(11.69) 


is a Minkowski-space 2-form. 


Exercise 11.9: Verify that the source-free Maxwell equations are indeed equiv- 
alent to dF = 0. 


The equation dF’ = 0 is automatically satisfied if we introduce a 4-vector 
l1-form potential A = —d¢dt + A,dx + A,dy + A-dz and set F = dA. 
The two Maxwell equations with sources 
divD = op, 
OD 
culH = j+ Br (11.70) 


3Tn this section we will use units in which c = €9 = to = 1. We take the Minkowski 
metric to be gy, = diag (—1,1,1,1) where 7° =t, c' =z , etc. 
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reduce in 4-tensor notation to the single equation 
CP aie a (11.71) 


Here J“ = (p,j) is the current 4-vector. 

This source equation takes a little more work to express in form language, 
but it can be done. We need a new concept: the Hodge “star” dual of a form. 
In d dimensions the “x” map takes a p-form to a (d— p)-form. It depends 
on both the metric and the orientation. The latter means a canonical choice 
of the order in which to write our basis forms, with orderings that differ 
by an even permutation being counted as the same. The full d-dimensional 
definition involves the Levi-Civita duality operation of chapter 10 , combined 
with the use of the metric tensor to raise indices. Recall that /g = \/det gy. 
(In Minkowski-signature metrics we should replace \/g by /—g.) We define 
“x” to be a linear map 


Pp (d—p) 
«: (\(r*M) > ( (T*M) (11.72) 
such that 
. a | = = 
xdz"...dr” & a-piv9r” 1 GPP C5 ipjpgr jade ...da?*. (11.73) 


Although this definition looks a trifle involved, computations involving it are 
not so intimidating. The trick is to work, whenever possible, with oriented 
orthonormal frames. If we are in euclidean space and {e*", e*”?,...,e*¢} is an 
ordering of the orthonormal basis for (7*M), whose orientation is equivalent 
to {e*!,e",...,e77} then 


«(eA em? A... A eM) = ett A CPt? A... A OM, (11.74) 


For example, in three dimensions, and with x,y,z, our usual Cartesian co- 
ordinates, we have 


xdx = dydz, 
Ady. = desde, 
Oe = -dgdy: (11.75) 


An analogous method works for Minkowski-signature (—,+,+,-+) metrics, 
except that now we must include a minus sign for each negatively normed 
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dt factor in the form being “starred.” Taking {dt, dx, dy, dz} as our oriented 
basis, we therefore find* 


xdxdy = —dzdt, 
xdydz = —dzxdt, 
xdzdx = —dydt, 
«dxdt =  dydz, 
xdydt =  dzdxz, 
xdzdt =  dxdy. (11.76) 


For example, the first of these equations is derived by observing that (dady)(—dzdt) = 
dtdxdydz, and that there is no “dt” in the product dxdy. The fourth fol- 
lows from observing that that (dxdt)(—dydx) = dtdrdydz, but there is a 
negative-normed “dt” in the product dxdt. 

The * map is constructed so that if 


a=  ovinig ipod” ss dg, (LL) 
p! 
and ‘ 
p= — Birin..ip tt dee” --- dx’, (11.78) 
p! 
then 
a \ (x8) = BA (xa) = (a, B) o, (11.79) 


where the inner product (a, 3) is defined to be the invariant 
1 ae ae 
(a, B) = pl? ag eee ‘Gg PIO ea Oaiaiug oi (11.80) 
and o is the volume form 


a = /gda'dz’ -.-dzx*. (11.81) 


In future we will write a @ for a A («@). Bear in mind that the “x” in this 
expression is acting 9 and is not some new kind of binary operation. 
We now apply these ideas to Maxwell. From the field-strength 2-form 


F = B,dydz + B,dzdz + B,drdy + E,drdt + E,dydt + E,dzdi, (11.82) 


4See for example: Misner, Thorn and Wheeler, Gravitation, (MTW) page 108. 
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we get a dual 2-form 
*F = —B,dadt — B,dydt — B,dzdt + E,dydz + E,dzdz + E,dady. (11.83) 


We can check that we have correctly computed the Hodge star of F' by taking 
the wedge product, for which we find 


i 
FxF= 5 Fw FM )o = (B3+ B+ B? — Et — B® — E?)dtdxdydz. (11.84) 


Observe that the expression B? — E? is a Lorentz scalar. Similarly, from the 
current 1-form 


J SJ) ,da" = pdt 4,00: jydy- y2dz, (11.85) 


we derive the dual current 3-form 
xJ = pdxdydz — j,dtdydz — j,dtdzdx — j,dtdxdy, (11.86) 


and check that 


Ix J =(J,J")o = (-p? + 72 +52 + j2)dtdadydz. (11.87) 
Observe that 3 
dx J= (+ + aiv§) dtdxdydz = 0, (11.88) 


expresses the charge conservation law. 

Writing out the terms explicitly shows that the source-containing Maxwell 
equations reduce to dx F' = «J. All four Maxwell equations are therefore very 
compactly expressed as 


d= 0. dk F Sod. 


Observe that current conservation d« J = 0 follows from the second Maxwell 
equation as a consequence of d? = 0. 


Exercise 11.10: Show that for a p-form w in d euclidean dimensions we have 
kw = (—1)PEOP)y, 


Show, further, that for a Minkowski metric an additional minus sign has to be 
inserted. (For example, * x F = —F, even though (—1)?4-?) = +1.) 
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11.4.2  Hamilton’s equations 


Hamiltonian dynamics takes place in phase space, a manifold with co-ordinates 
(q',...,q",p',..-,p"). Since momentum is a naturally covariant vector,° 
phase space is usually the co-tangent bundle T*M of the configuration man- 
ifold M. We are writing the indices on the p’s upstairs though, because we 
are considering them as co-ordinates in T*M. 

We expect that you are familiar with Hamilton’s equation in their q,p 
setting. Here, we shall describe them as they appear in a modern book on 
Mechanics, such as Abrahams and Marsden’s Foundations of Mechanics, or 
V. I. Arnold’s Mathematical Methods of Classical Mechanics. 

Phase space is an example of a symplectic manifold, a manifold equipped 
with a symplectic form — a closed, non-degenerate, 2-form field 


1 a 
= qwisde'da’. (11.89) 


Recall that the word closed means that dw = 0. Non-degenerate means that 
for any point x the statement that w(X,Y) = 0 for all vectors Y € TM, 
implies that X = 0 at that point (or equivalently that for all x the matrix 
wi;(2) has an inverse w(x). 

Given a Hamiltonian function H on our symplectic manifold, we define 
a velocity vector-field vy by solving 


dH = —-i,,w = —w(vg, ) (11.90) 


for vz. If the symplectic form is w = dp'dq! + dp*dq? +---+dp"dq", this is 
nothing but a fancy form of Hamilton’s equations. To see this, we write 
OH OH |, 
- dp 
Og’ Op* 


‘ p’) for the velocity-in-phase-space com- 


dH = —dq' + (11.91) 


and use the customary notation (q¢ 
ponents, so that 


_O 
vg =G—t+p—. 11.92 
H=4 agi D Opi ( ) 
Now we work out 
ly = dp'dq' (q! Ogi + POps, ) 
= pidq' —¢q'dp’, (11.93) 
°To convince yourself of this, remember that in quantum mechanics p,, = —ihs2, and 


the gradient of a function is a covector. 
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so, comparing coefficients of dp’ and dq‘ on the two sides of dH = —i,,,w, we 
read off 


OH ix OH 
— Op?’ ae dq: 
Darboux’ theorem, which we will not try to prove, says that for any point x 
we can always find co-ordinates p,q, valid in some neigbourhood of x, such 
that w = dp'dq! + dp?dq? +---dp"dq". Given this fact, it is not unreasonable 
to think that there is little to gained by using the abstract differential-form 
language. In simple cases this is so, and the traditional methods are fine. 
It may be, however, that the neigbourhood of x where the Darboux co- 
ordinates work is not the entire phase space, and we need to cover the space 
with overlapping p,q co-ordinate charts. Then, what is a p in one chart 
will usually be a combination of p’s and q’s in another. In this case, the 
traditional form of Hamilton’s equations loses its appeal in comparison to 
the co-ordinate-free dH = —i,,,w. 

Given two functions H1, Hz we can define their Poisson bracket {H1, Ho}. 
Its importance lies in Dirac’s observation that the passage from classical 
mechanics to quantum mechanics is accomplished by replacing the Poisson 
bracket of two quantities, A and B, with the commutator of the correspond- 
ing operators A, and B: 


a) 


q 


(11.94) 


[A,B] <> h{A,B}+0 (Kh). (11.95) 
We define the Poisson bracket by® 


ef dH. 
{H, He = = vq, Hp. (11.96) 
Ay 


Now, vq,H2 = dHo(vy,), and Hamilton’s equations say that dHo(vy,) = 
w(vy,,VH,). Thus, 
{H,, Ho} = w(vn,, vm). (11.97) 


The skew symmetry of w(vy,,Uy,) shows that despite the asymmetrical ap- 
pearance of the definition we have skew symmetry: {H, H2} = —{ Ho, Hy}. 
Moreover, since 


Un, (H2H3) = (vq, H2)H3 + Ho(vn, Hs), (11.98) 


°Our definition differs in sign from the traditional one, but has the advantage of mini- 
mizing the number of minus signs in subsequent equations. 
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the Poisson bracket is a derivation: 
{H,, H.H3} = {H,, H2}H3 + H2{ Mj, H3}. (11.99) 


Neither the skew symmetry nor the derivation property require the con- 
dition that dw = 0. What does need w to be closed is the Jacobi identity: 


{{H,, Hy}, H3} + {{ Ao, H3}, Hi} + {{H3, Wi}, Ho} = 0. (11.100) 
We establish Jacobi by using Cartan’s formula in the form 


dws (UH,; UH, , UH) = VH,W (Ux; UH) 7 VHW(UH5, UH, ) - VH3W (Um; UH) 
—w([vH,, UH], VHs) a w([U ze, Vis]; UH) _ w([vHs, VA]; UH): 
(11.101) 


It is relatively straight-forward to interpret each term in the first of these 
lines as Poisson brackets. For example, 


UH,W(UVH,, Us) = Un, { He, H3} = {Mi, {Ho, H3}}. (11.102) 


Relating the terms in the second line to Poisson brackets requires a little 
more effort. We proceed as follows: 


w([VH,UH|,UH3) = —w(vHs, [VH,, Um|) 
= dH3(|vq,, vm]) 
= [vi Voy] M3 
= vH,(vH,H3) — vx, (vn, A3) 
= {H1,{Ho, H3}} — (He, {f1, Hs}} 
= {1,{Ho,H3}} + {He {Hs, Mi}}. (11.103) 


Adding everything togther now shows that 


0 = dw(vy,, UH, VHs) 
= —{{HM), Ho}, H3} — {{Ho, Hs}, Hi} — {{H3, Wi}, H2}(11.104) 


If we rearrange the Jacobi identity as 


{f,, {H2, H3}} — { He, {fi, H3}} = {{ Ai, Ao}, As}, (11.105) 
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we see that it is equivalent to 


[vi,, UH] = UL, ,H}- 


The algebra of Poisson brackets is therefore homomorphic to the algebra of 
the Lie brackets. The correspondence is not an isomorphism, however: the 
assignment H +> vy fails to be one-to-one because constant functions map 
to the zero vector field. 


Exercise 11.11: Use the infinitesimal homotopy relation, to show that £L,,,w = 
0, where vy is the vector field corresponding to H. Suppose now that the phase 
space is 2n dimensional. Show that in local Darboux co-ordinates the 2n-form 
w"/n! is, up to a sign, the phase-space volume element d"pd"q. Show that 
Ly,w"/n! = 0 and that this result is Liouville’s theorem on the conservation 
of phase-space volume. 


The classical mechanics of spin 


It is sometimes said in books on quantum mechanics that the spin of an elec- 
tron, or other elementary particle, is a purely quantum concept and cannot 
be described by classical mechanics. This statement is false, but spin is the 
simplest system in which traditional physicist’s methods become ugly and it 
helps to use the modern symplectic language. A “spin” S can be regarded 
as a fixed length vector that can point in any direction in R*®. We will take 
it to be of unit length so that its components are 


Sy, = sin@cos¢, 
Sy = sin@sind, 
Sy = cos, (11.106) 


where @ and ¢ are polar co-ordinates on the two-sphere $7. 

The surface of the sphere turns out to be both the configuration space 
and the phase space. In particular the phase space for a spin is not the 
cotangent bundle of the configuration space. This has to be so: we learned 
from Niels Bohr that a 2n-dimensional phase space contains roughly one 
quantum state for every h” of phase-space volume. A cotangent bundle 
always has infinite volume, so its corresponding Hilbert space is necessarily 
infinite dimensional. A quantum spin, however, has a finite-dimensional 
Hilbert space so its classical phase space must have a finite total volume. 
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This finite-volume phase space seems un-natural in the traditional view of 
mechanics, but it fits comfortably into the modern symplectic picture. 

We want to treat all points on the sphere alike, and so it is natural to 
take the symplectic 2-form to be proportional to the element of area. Suppose 
that w = sin@déd@. We could write w = —dcos@d¢ and regard @ as “gq” 
and —cos@ as “p” (Darboux’ theorem in action!), but this identification is 
singular at the north and south poles of the sphere, and, besides, it obscures 
the spherical symmetry of the problem, which is manifest when we think of 
w as d(area). 

Let us take our hamiltonian to be H = BS, corresponding to an applied 
magnetic field in the x direction, and see what Hamilton’s equations give for 
the motion. First we take the exterior derivative 


d( BS.) = B(cos @ cos ¢d6 — sin sin ¢d@). (11.107) 
This is to be set equal to 
—w(vgs,, ) = v°(—sin0)dd + v? sin 6d. (11.108) 


Comparing coefficients of d? and dd, we get 
U(BS,) = 0° Op + 0°04 = B(sin 0p + cos ¢ cot 00g), (11.109) 
i.e. B times the velocity vector for a rotation about the x axis. This velocity 
field therefore describes a steady Larmor precession of the spin about the 
applied field. This is exactly the motion predicted by quantum mechanics. 
Similarly, setting B = 1, we find 
vs, = —cos dp + sin dcot 904, 

= —05. (11.110) 
From these velocity fields we can compute the Poisson brackets: 

= sin Od6d¢(sin 0g + cos ¢ cot 604, — cos G0p + sin ¢ cot 805) 

= sin 6(sin? dcot 6 + cos” d cot A) 


= cosd=S,. 
Repeating the exercise leads to 
{Sz,Sy} = Sz, 
{Sy,Sz} = Sz, 
{S.,S:} = Sy. (11.111) 
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These Poisson brackets for our classical “spin” are to be compared with the 
commutation relations [S,,S,] = ihS, etc. for the quantum spin operators 


11.5 Covariant derivatives 


Covariant derivatives are a general class of derivatives that act on sections 
of a vector or tensor bundle over a manifold. We will begin by considering 
derivatives on the tangent bundle, and in the exercises indicate how the idea 
generalizes to other bundles. 


11.5.1 Connections 


The Lie and exterior derivatives require no structure beyond that which 
comes for free with our manifold. Another type of derivative that can act on 
tangent-space vectors and tensors is the covariant derivative Vx = X"V ,. 
This requires an additional mathematical object called an affine connection. 
The covariant derivative is defined by: 
i) Its action on scalar functions as 


Vf a Xf. (11.112) 


ii) Its action a basis set of tangent-vector fields e,(x) = eM(x)O, (a local 
frame, or vielbein") by introducing a set of functions w',,(a) and setting 


Ve,€; = Ci’ jr. (11.113) 


ii) Extending this definition to any other type of tensor by requiring Vx 
to be a derivation. 
iii) Requiring that the result of applying Vx to a tensor is a tensor of the 
same type. 
The set of functions Ww, ,(@) is the connection. In any local co-ordinate chart 
we can choose them at will, and different choices define different covariant 
derivatives. (There may be global compatibility constraints, however, which 
appear when we assemble the charts into an atlas.) 


In practice viel, “many”, is replaced by the appropriate German numeral: ein-, zwei-, 
drei-, vier-, ftinf-, ..., indicating the dimension. The word bein means “leg.” 
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Warning: Despite having the appearance of one, Wp is not a tensor. It 
transforms inhomogeneously under a change of frame or co-ordinates — see 
equation (11.132). 

We can, of course, take as our basis vectors the co-ordinate vectors e,, = 
O,. When we do this it is traditional to use the symbol I for the co-ordinate 
frame connection instead of w. Thus, 


Vuev = Ve, er = el rp. (11.114) 


The numbers I™,,, are often called Christoffel symbols. 
As an example consider the covariant derivative of a vector f’e,. Using 
the derivation property we have 


VulfPev) = (uf ev + FV per 


= (Ou f’ ev Tv Pek a 
en {O,f" + PT’ yy} (11.115) 


In the first line we have used the defining property that Ve, acts on the 
functions f” as O,,, and in the last line we interchanged the dummy indices 
y and A. We often abuse the notation by writing only the components, and 
set 


Mig SO et Te (11.116) 


Similarly, acting on the components of a mixed tensor, we would write 
VA ay = On A” ay 1%, A* ay — Tr gu A® ry — Py A%ay- (11.117) 


When we use this notation, we are no longer regarding the tensor components 
as “functions.” 

Observe that the plus and minus signs in (11.117) are required so that, 
for example, the covariant derivative of the scalar function f,g® is 


Vi (fog*) = Oy (fo9”) 

(Ou fa) 9° + fa (Qug”) 

(Oufa — fT ran) 9% + fa (Oug® + 9° Tru) 

(Vito) 9° + fa (Vung) 5 (11.118) 


and so satisfies the derivation property. 
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Parallel transport 


We have defined the covariant derivative via its formal calculus properties. 
It has, however, a geometrical interpretation. As with the Lie derivative, in 
order to compute the derivative along X of the vector field Y, we have to 
somehow carry the vector Y(x) from the tangent space TM, to the tangent 
space TM,4¢x, where we can subtract it from Y(~+eX) . The Lie derivative 
carries Y along with the X flow. The covariant derivative implicitly carries 
Y by “parallel transport”. If 7 : s + x(s) is a parameterized curve with 
tangent vector X“0,,, where 

_ dt 
Sie 
then we say that the vector field Y(x“(s)) is parallel transported along the 
curve ¥ if 


x (11.119) 


VxY =0, (11.120) 
at each point x“(s). Thus, a vector that in the vielbein frame e; at x has 
components Y’ will, after being parallel transported to x + «X, end up com- 
ponents 

Y* — ew j,YIX". (11.121) 
In a co-ordinate frame, after parallel transport through an infinitesimal dis- 
placement 62“, the vector Y”O, will have components 
Y’ = Y"-T”), Ya", (11.122) 
and so 


On 


Y" (a + da") —{Y" (2) —T’),Y*6x"} 
Oa tOLY” +I, Y} (11.123) 


Curvature and torsion 


As we said earlier, the connection Ww", (2) is not itself a tensor. Two important 

quantities which are tensors, are associated with Vx: 
i) The torsion 

T(X,Y) =VxY — VyX — [X,Y]. (11.124) 

The quantity T(X,Y) is a vector depending linearly on X,Y, so T at 

the point x is a map TM, x TM, — TM,, and so a tensor of type 
(1,2). In a co-ordinate frame it has components 

i hagiere— Dperrene Ait (11.125) 


Vise 
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ii) The Riemann curvature tensor 
R(X, Y)Z =VxVyZ — VyVzZ -— VixyZ. (11.126) 


The quantity R(X,Y)Z is also a vector, so R(X,Y) is a linear map 
TM, — TM,, and thus R itself is a tensor of type (1,3). Written out 
in a co-ordinate frame, we have 


Ro pi = Ol Gn Onl age al peo Pl oie (11.127) 


If our manifold comes equipped with a metric tensor g,, (and is thus 
a Riemann manifold), and if we require both that T = 0 and V,gag = 0, 
then the connection is uniquely determined, and is called the Riemann, or 
Levi-Civita, connection. In a co-ordinate frame it is given by 


1 
ao (Ongau + Ov Gur —_ Ong) . (11.128) 


This is the connection that appears in General Relativity. 

The curvature tensor measures the degree of path dependence in parallel 
transport: if Y’(«) is parallel transported along a path y: s+ 2"(s) from 
a to b, and if we deform y so that x"(s) — x"(s) + da"(s) while keeping the 
endpoints a, b fixed, then 


a 
I wy = 


oh) == / * Rg (e)¥9 (0)! dx”. (11.129) 


If Rg, = 0 then the effect of parallel transport from a to b will be indepen- 
dent of the route taken. 

The geometric interpretation of T),, is less transparent. On a two-dimensional 
surface a connection is torsion free when the tangent space “rolls without 
slipping” along the curve y. 


Exercise 11.12: Metric compatibility. Show that the Riemann connection 


1 
Lp = oie (Ou9rv Ts Ov Gur = On9uv) : 
follows from the torsion-free condition [°,,, =I°°,, together with the metric 
compatibility condition 
ViJoe = On Jap — ak Ive — Dak Jov = 0. 


Show that “metric compatibility” means that that the operation of raising or 
lowering indices commutes with covariant derivation. 
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Exercise 11.13: Geodesic equation. Let y : s + x(s) be a parametrized 
path from a to b. Show that the Euler-Lagrange equation that follows from 
minimizing the distance functional 


b 
Sy) = / V Juv th LY ds, 


where the dots denote differentiation with respect to the parameter s, is 


Here Iq is the Riemann connection (11.128). 


Exercise 11.14: Show that if A” is a vector field then, for the Riemann con- 


nection, 
0, /g A" 
van Lava" 
Vg xt 
In other words, show that 
eye 
ape : 
Vg Oxt 


Deduce that the Laplacian acting on a scalar field ¢ can be defined by setting 
either 
V*¢ _ Iw V pV; 


or 


the two definitions being equivalent. 


11.5.2 Cartan’s form viewpoint 


Let e*/(x) = e*/,(x)da" be the basis of one-forms dual to the vielbein frame 
e;(x) = ef (x)d,. Since . . 
0% = e“'(e;) = eer. (11.130) 


the matrices e*/, and e! are inverses of one-another. We can use them to 
change from roman vielbein indices to greek co-ordinate frame indices. For 
example: 

913 = G(Cs, €j) = CF ue5 (11.131) 
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and . . 
win = &Y (Ouex eg + em eres Typ. (11.132) 
Cartan regards the connection as being a matrix of one-forms with 
matrix entries w'; = w',,da". In this language equation (11.113) becomes 


Vxej = ew’,(X). (11.133) 


Cartan’s viewpoint separates off the index py, which refers to the direction 
dx" o X* in which we are differentiating, from the matrix indices 2 and 
j that act on the components of the vector or tensor being differentiated. 
This separation becomes very natural when the vector space spanned by the 
e;(x) is no longer the tangent space, but some other “internal” vector space 
attached to the point xz. Such internal spaces are common in physics, an im- 
portant example being the “colour space” of gauge field theories. Physicists, 
following Hermann Weyl, call a connection on an internal space a “gauge po- 
tential.” To mathematicians it is simply a connection on the vector bundle 
that has the internal spaces as its fibres. 

Cartan also regards the torsion T and curvature R as forms; in this case 
vector- and matrix-valued two-forms, respectively, with entries 


4 uf 4 Vy 
ie af wdehda ; (11.134) 
4 1 a i Pa 


In his form language the equations defining the torsion and curvature become 
Cartan’s structure equations: 


de“ + uw, Ne = T*, (11.136) 


The last equation can be written more compactly as 
dQ+QAQE=R. (11.138) 
From this, by taking the exterior derivative, we obtain the Bianchi identity 


dR-RAQ+QNAR=O0. (11.139) 
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On a Riemann manifold, we can take the vielbein frame e; to be orthonor- 
mal. In this case the roman-index metric g;; = g(e;,e;) becomes 6,;. There 
is then no distinction between covariant and contravariant roman indices, 
and the connection and curvature forms, Q, R, being infinitesimal rotations, 
become skew symmetric matrices: 


Wig = Wz, Rig = —Ryi. (11.140) 


11.6 Further exercises and problems 


Exercise 11.15: Consider the vector fields X = yO,;, Y = Oy in R?. Find the 
flows associated with these fields, and use them to verify the statements made 
in section 11.2.1 about the geometric interpretation of the Lie bracket. 


Exercise 11.16: Show that the pair of vector fields L, = x0, — yO, and Ly = 
20, — x0, in R? is in involution wherever they are both non-zero. Show further 
that the general solution of the system of partial differential equations 


(w0y — yOr)f = 0, 
(COzg— 202) f° = 20; 


in R? is f(x,y, z) = F(a? + y? + 27), where F is an arbitrary function. 


Exercise 11.17: In the rolling conditions (11.29) we are using the “Y” conven- 
tion for Euler angles. In this convention 6 and ¢ are the usual spherical polar 
co-ordinate angles with respect to the space-fixed xyz axes. They specify the 
direction of the body-fixed Z axis about which we make the final 7 rotation 
— see figure 11.7. 


a) Show that (11.29) are indeed the no-slip rolling conditions 


Lo= Wy, 
— Wer, 
0 = Wy, 


where (wWz,Wy,wz) are the components of the ball’s angular velocity in 
the xyz space-fixed frame. 

b) Solve the three constraints in (11.29) so as to obtain the vector fields 
roll,, rolly of (11.30). 
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ra 
é 9 A 


Figure 11.7: The “Y” convention for Euler angles. The XY Z axes are fixed 
to the ball, and the the xyz axes are fixed in space. We first rotate the ball 
through an angle ¢ about the z axis, thus taking y — Y’, then through 0 
about Y’, and finally through yw about Z, so taking Y’ > Y. 


c) Show that 
(roll, rolly] = —spin,, 


where spin, = Og, corresponds to a rotation about a vertical axis through 
the point of contact. This is a new motion, being forbidden by the w, = 0 
condition. 

d) Show that 


[spin,,roll,] = spin,, 


[spin,,rolly] = spin,, 
where the new vector fields 


spin, = —(roll, — 0,), 


spin, = (roll, — 0dr), 


correspond to rotations of the ball about the space-fixed x and y axes 
through its centre, and with the centre of mass held fixed. 


We have generated five independent vector fields from the original two. There- 
fore, by sufficient rolling to-and-fro, we can position the ball anywhere on the 
table, and in any orientation. 
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Exercise 11.18: The semi-classical dynamics of charge —e electrons in a mag- 
netic solid are governed by the equations® 


Oe(k) 
= Apo kx Q, 
. OV . 
k = — Ge —er X B. 


Here k is the Bloch momentum of the electron, r is its position, «(k) its band 
energy (in the extended-zone scheme), and B(r) is the external magnetic field. 
The components Q; of the Berry curvature Q(k) are given in terms of the 
periodic part |u(k)) of the Bloch wavefunctions of the band by 


Qi? Ou} Ou \  / Ow) Ou 
i "GR \\ Ok; | Okx Oke |Ok; | 
The only property of Q(k) needed for the present problem, however, is that 
div, = 0. 


a) Show that these equations are Hamiltonian, with 


HA(r,k) = e(k) + V(r) 


and with 
1 
w = dkjdx; — 5 cuikBi(r)de jd + 5eijn i(k) dk dhe. 


as the symplectic form.? 
b) Confirm that the w defined in part b) is closed, and that the Poisson 
brackets are given by 


oy Sigh Oe 
tte} = Gy eR- my’ 
Oij + eBiQ,; 
Sey so ee 
ee (1 +eB-Q)’ 
€ijreBr 
iy ky = : : 
this Ry} (1+eB-Q) 


c) Show that the conserved phase-space volume w?/3! is equal to 
(1+ eB - Q)d®kd? x, 
instead of the naively expected d?kd?x. 


8M. C. Chang, Q. Niu, Phys. Rev. Lett. 75 (1995) 1348. 
°C. Duval, Z. Horvath, P. A. Horvathy, L. Martina, P. C. Stichel, Modern Physics 
Letters B 20 (2006) 373. 


11.6. FURTHER EXERCISES AND PROBLEMS 459 


The following pair of exercises show that Cartan’s expression for the curva- 
ture tensor remains valid for covariant differentiation in “internal” spaces. 
There is, however, no natural concept analogous to the torsion tensor for 
internal spaces. 


Exercise 11.19: Non-abelian gauge fields as matrix-valued forms. In a non- 
abelian Yang-Mills gauge theory, such as QCD, the vector potential 


A= A, da 


is matrix-valued, meaning that the components A,, are matrices which do not 
necessarily commute with each other. (These matrices are elements of the Lie 
algebra of the gauge group, but we won’t need this fact here.) The matrix- 
valued curvature, or field-strength, 2-form F' is defined by 


1 
F=dA+A?= 5 Fda de”. 


Here a combined matrix and wedge product is to be understood: 


(A?)*, = A% A AS = At pA deMdz”. 


i) Show that A? = 3[A,,, A,]dx“dx”, and hence show that 
Fup = OpAv — OvAy + [Ap, Av]. 
ii) Define the gauge-covariant derivatives 
Vie Oa 


and show that the commutator [V,,, V_] of two of these is equal to Fy. 
Show further that if X, Y are two vector fields with Lie bracket [X,Y] 
and Vx = X¥V,,, then 


F(X,Y) = [Vx, Vy] - Vixy)- 
iii) Show that F obeys the Bianchi identity 
dF —-FA+AF=0. 


Again wedge and matrix products are to be understood. This equation 
is the non-abelian version of the source-free Maxwell equation dF’ = 0. 
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iv) Show that, in any number of dimensions, the Bianchi identity implies 
that the 4-form tr (F?) is closed, i.e. that dtr (F?) = 0. Similarly show 
that the 2n-form tr (F') is closed. (Here the “tr” means a trace over the 
roman matrix indices, and not over the greek space-time indices.) 


v) Show that, 
tr(F?) =d {tr (44 - 4°) } 


The 3-form tr (AdA + 2 A?) is called a Chern-Simons form. 


Exercise 11.20: Gauge transformations. Here we consider how the matrix- 
valued vector potential transforms when we make a change of gauge. In other 
words, we seek the non-abelian version of A,, — A, + Ou. 


i) Let g be an invertable matrix, and 6g a matrix describing a small change 
in g. Show that the corresponding change in the inverse matrix is given 
by 6(g-*) = -g"*(69)g™*. 

ii) Show that under the gauge transformation 


A AV= g Ag + g dg, 


we have F — g~!Fg. (Hint: The labour is minimized by exploiting the 
covariant derivative identity in part ii) of the previous exercise). 

iii) Deduce that tr (F'") is gauge invariant. 

iv) Show that a necessary condition for the matrix-valued gauge field A to 
be “pure gauge”, i.e. for there to be a position dependent matrix g(x) 
such that A = g~'dg, is that F = 0, where F is the curvature two-form 
of the previous exercise. (If we are working in a simply connected region, 
then F' = 0 is also a sufficient condition for there to be a g such that 
A=gq~'‘dg, but this is a little harder to prove.) 


In a gauge theory based on a Lie group G, the matrices g will be elements of 
the group, or, more generally, they will form a matrix representation of the 
group. 


Chapter 12 


Integration on Manifolds 


One usually thinks of integration as requiring measure — a notion of volume, 
and hence of size and length, and so a metric. A metric, however, is not 
required for integrating differential forms. They come pre-equipped with 
whatever notion of length, area, or volume is required. 


12.1 Basic notions 


12.1.1 Line integrals 


Consider, for example, the form df. We want to try to give a meaning to the 
symbol 


i= f a (12.1) 


Here, [is a path in our space starting at some point Py and ending at the 
point P,. Any reasonable definition of J; should end up with the answer 
that we would immediately write down if we saw an expression like J; in an 
elementary calculus class. This answer is 


f= | df = f(P1) — f(Pa). (12.2) 


No notion of a metric is needed here. There is, however, a geometric picture of 
what we have done. We draw in our space the surfaces ..., f(x) = —1, f(x) = 
0, f(z) = 1,..., and perhaps fill in intermediate values if necessary. We 
then start at Po and travel from there to P,, keeping track of how many of 
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these surfaces we pass through (with sign -1, if we pass back through them). 
The integral of df is this number. Figure 12.1 illustrates a case in which 


fdf=55-15=4. 


f=l1 2 3 4 5 6 


Figure 12.1: The integral of a one-form 


What we have defined is a signed integral. If we parametrize the path as 


x(s),0<s <1, and with x(0) = Po, x(1) = Py we have 


(12.3) 


where the right hand side is an ordinary one-variable integral. It is important 
that we did not write || in this integral. The absence of the modulus sign 
ensures that if we partially retrace our route, so that we pass over some part 
of T three times—twice forward and once back—we obtain the same answer 


as if we went only forward. 


12.1.2 Skew-symmetry and orientations 


What about integrating 2 and 3-forms? Why the skew-symmetry? To answer 


these questions, think about assigning some sort of “area” in 


R? to the par- 


allelogram defined by the two vectors x,y. This is going to be some function 
of the two vectors. Let us call it w(x, y). What properties do we demand of 


this function? There are at least three: 


i) Scaling: If we double the length of one of the vectors, we expect the area 
to double. Generalizing this, we demand that w(Ax, wy) = (Aj)w(x, y). 
(Note that we are not putting modulus signs on the lengths, so we are 
allowing negative “areas”, and for the sign to change when we reverse 


the direction of a vector.) 
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ii) Additivity: The drawing in figure 12.2 shows that we ought to have 
W(X + X2,y) = w(X1,y) + w(X2,y), (12.4) 


similarly for the second slots. 


Figure 12.2: Additivity of w(x, y). 


iii) Degeneration: If the two sides coincide, the area should be zero. Thus 
u(x, oc) =, 
The first two properties, show that w should be a multilinear form. The 
third shows that it must be skew-symmetric: 


O=uw(x+y,x+y) = w(x,x)+u(x,y)+u(y,x)+u/(y,y) 
= w(x,y) +u(y,x). (12:5) 


So we have 
w(x, y) = —w(y, x). (12.6) 


These are exactly the properties possessed by a 2-form. Similarly, a 3-form 
outputs a volume element. 

These volume elements are oriented. Remember that an orientation of a 
set of vectors is a choice of order in which to write them. If we interchange 
two vectors, the orientation changes sign. We do not distinguish orientations 
related by an even number of interchanges. A p-form assigns a signed (+) 
p-dimensional volume element to an orientated set of vectors. If we change 
the orientation, we change the sign of the volume element. 


Orientable and non-orientable manifolds 


In the classic video game Asteroids, you could select periodic boundary con- 
ditions so that your spaceship would leave the right-hand side of the screen 
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/ / 


‘a \ RP? 
a) b) 


Figure 12.3: A spaceship leaves one side of the screen and returns on the other 
with a) torus boundary conditions, b) projective-plane boundary conditions. 
Observe how, in case b), the spaceship has changed from being left handed 
to being right-handed. 


and re-appear on the left. The game universe was topologically a torus T°. 
Suppose that we modify the game code so that each bit of the spaceship 
re-appears at the point diametrically opposite the point it left. This does not 
seem like a drastic change until you play a game with a left-hand-drive (US) 
spaceship. If you send the ship off the screen and watch as it re-appears on 
the opposite side, you will observe the ship transmogrify into a right-hand- 
drive (British) craft. If we ourselves made such an excursion, we would end 
up starving to death because all our left-handed digestive enzymes would 
have been converted to right-handed ones. The game-space we have con- 
structed is topologically equivalent to the real projective plane RP?. The 
lack of a global notion of being left or right-handed makes it an example of 
a non-orientable manifold. 

A manifold or surface is orientable if we can choose a global orientation 
for the tangent bundle. The simplest way to do this would be to find a 
smoothly varying set of basis-vector fields, e,,(x), on the surface and define 
the orientation by chosing an order, e;(x), e2(x),...,ea(x), in which to write 
them. In general, however, a globally-defined smooth basis will not exist 
(try to construct one for the two-sphere, S?!). We will, however, be able 


to find a continously varying orientated basis el” (x), es (Petes ef) (x) for 
each member, labelled by (7), of an atlas of coordinate charts. We should 
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choose the charts so that the intersection of any pair forms a connected set. 
Assuming that this has been done, the orientation of pair of Over eEDing 
charts is said to coincide if the determinant, det A, of the map e( =3 Avell 
relating the bases in the region of overlap, is positive.! If bases can be chosen 
so that all overlap determinants are positive, the manifold is orientable and 
the selected bases define the orientation. If bases cannot be so chosen, the 
manifold or surface is non-orientable. 


Exercise 12.1: Consider a three-dimensional ball B? with diametrically oppo- 
site points of its surface identified. What would happen to an aircraft flying 
through the surface of the ball? Would it change handedness, turn inside out, 
or simply turn upside down? Is this ball an orientable 3-manifold? 


12.2 Integrating p-forms 


A p-form is naturally integrated over an oriented p-dimensional surface or 
manifold. Rather than start with an abstract definition, we will first explain 
this pictorially, and then translate the pictures into mathematics. 


12.2.1 Counting boxes 


To visualize integrating 2-forms let us try to make sense of 


| eas, 12.7) 
Q 


where (2) is an oriented two-dimensional surface embedded in three dimen- 
sions. The surfaces f = const. and g = const. break the space up into a 
series of tubes. The oriented surface Q cuts these tubes in a two-dimensional 
mesh of (oriented) parallelograms as shown in 12.4. 

We define an integral by counting how many parallelograms (including 
fractions of a parallelogram) there are, taking the number to be positive if the 
parallelogram given by the mesh is oriented in the same way as the surface, 
and negative otherwise. To compute 


i hdfdg (12.8) 
Q 


‘The determinant will have the same sign in the entire overlap region. If it did not, 
continuity and connectedness would force it to be zero somewhere, implying that one of 
the putative bases was not linearly independent there 
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Figure 12.4: The integration region cuts the tubes into parallelograms. 


we do the same, but weight each parallelogram, by the value of h at that 
point. The integral its fdxdy, over a region in R? thus ends up being the 
number we would compute in a multivariate calculus class, but the integral 
fo fdydx, would be minus this. Similarly we compute 


[agar (12.9) 


of the 3-form df dgdh over the oriented volume =, by counting how many 
boxes defined by the level surfaces of f,g,h, are included in =. 

An equivalent way of thinking of the integral of a p-form uses its definition 
as a skew-symmetric p-linear function. Accordingly we evaluate 


= fw, (12.10) 
Q 


where w is a 2-form, and 2 is an oriented 2-surface, by plugging vectors into 
w. In figure 12.5 we show a tiling the surface Q by collection of (infinitesimal) 
parallelograms, each defined by an oriented pair of vector v; and v2 that lie 
in the tangent space at one corner point x of the parallelogram. At each 
point x, we insert these vectors into the 2-form (in the order specified by 
their orientation) to get w(v1, V2), and then sum the resulting numbers over 
all the parallelograms to get Jy. Similarly, we integrate a p-form over an 
oriented p-dimensional region by decomposing the region into infinitesimal 
p-dimensional oriented parallelepipeds, inserting their defining vectors into 
the form, and summing their contributions. 
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Figure 12.5: A tiling of Q with small oriented parallelograms. 


12.2.2 Relation to conventional integrals 


In the previous section we explained how to think pictorially about the inte- 


gral. Here, we interpret the pictures as multi-variable calculus. 


We begin by motivating our recipe by considering a change of variables 


in an integral in R?. Suppose we set 71 = 2(y1, y2), 2 = T2(Y1, yo) in 


t= f flade'ae? 


and use 
Ox Or 
da’ = 5,1 =p ap 
Ox? Ox? 
dx? = ayn + ape 


Since dy'dy? = —dy?dy', we have 


dxidx? = Oat Ox? _ Ox" Oat dy'dy?. 
Oy! Oy? —- Oy! Oy? 
Thus A(«!, 22) 
x\da'dx? = a(y)) dy dy? 
[ft RGOr peak 
where a a is the Jacobian determinant 


Oy! Oy? Oy! Oy? 


OG" a") (= Oz" Ox? =) 


(12.11) 


(12.12) 


(12.13) 


(12.14) 


(12.15) 
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and 1)’ is the integration region in the new variables. There is therefore no 
need to include an explicit Jacobian factor when changing variables in an 
integral of a p-form over a p-dimensional space—it comes for free with the 
form. 

This observation leads us to the general prescription: To evaluate Jo Ww, 
the integral of a p-form 


Ii 
= Wp p12..pp bee - daly (12.16) 
Pp 


over the region 2) of a p dimensional surface in a d > p dimensional space, 
substitute a paramaterization 


os = ee tenes 


a = OG eases (12.17) 


of the surface into w. Next, use 
Cha 
— - de" 12.18 
XL OE! E ? ( ) 
so that , 
Ox Og"? 


dé! .-- d&?, (12.19) 


wW— WU) )ivtain BET "Be 
which we regard as a p-form on 2. (Our customary 1/p! is absent here 
because we have chosen a particular order for the dé’s.) Then 


def Oat OR vel gee 
[oo [ee Odama Ger Fodder, (1220) 
where the right hand side is an ordinary multiple integral. This recipe is a 
generalization of the formula (12.3), which reduced the integral of a one-form 
to an ordinary single-variable integral. Because the appropriate Jacobian 
factor appears automatically, the numerical value of the integral does not 
depend on the choice of parameterization of the surface. 

Example: To integrate the 2-form x dydz over the surface of a two dimen- 
sional sphere of radius R, we parametrize the surface with polar angles as 


Rsin ¢sin 6, 
Reos¢sin 9, 
z = Roos. (12.21) 
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Then 
dy = —RKsin dsinéd¢é + Rcos dcos dé, 
dz = —Rsinédé, (12.22) 
and so 
x dydz = R°sin?¢ sin?6 dodo. (12.23) 


We therefore evaluate 


/ x dydz 
sphere 


2m pt 
Re | i sin’¢ sin®0 dédé 
0 Jo 
20 7 
= re [ sin’o.do f sin®6 dé 
0 0 


1 
= Rex f (1 — cos” 0) dcos 6 


1 


= anh. (12.24) 


The volume form 


Although we do not need any notion of length to integrate a differential form, 
a p-dimensional surface embedded or immersed in R? does inherit a distance 
scale from the R@ Euclidean metric, and this can be used to define the area 
or volume of the surface. When the Cartesian co-ordinates x!,...,a2% of 
a point in the surface are given as 17(€',...,€?), where the €',...,€?, are 
co-ordinates on the surface, then the inherited, or induced, metric is 


“ds?” =g( , ) = Gu dé" @ d&’, (12.25) 
where ‘ 
Ox® Ox* 
Iw = d, BER OE (12.26) 


The volume form associated with the induced metric is 
d(Volume) = \/g dé" - -- dE”, (12.27) 


where g = det (g,,). The integral of this p-form over a p-dimensional region 
gives the area, or p-dimensional volume, of the region. 
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If we change the parameterization of the surface from €" to ¢“, neither 
the dé! ---d€? nor the /g are separately invariant, but the Jacobian arising 
from the change of the p-form, d&!---d€? — d¢!---d¢? cancels against the 
factor coming from the transformation law of the metric tensor gy, — gi, 
leading to 

Vg dé) ---dé? = fg'de'..-de?. (12.28) 
The volume of the surface is therefore independent of the co-ordinate system 
used to evaluate it. 
Example: The induced metric on the surface of a unit-radius two-sphere 
embedded in R?, is, expressed in polar angles, 


“ds?” =g( , ) =d0 @d0+sin26 dd @ dé. 


Thus 

{il 0 a 
g= E er = 
and 

d(Area) = sin 6 d6d@. 


12.3 Stokes’ theorem 


All of the integral theorems of classical vector calculus are special cases of 
Stokes’ Theorem: If 02 denotes the (oriented) boundary of the (oriented) 


region 2, then 
| dw = | Ww 
Q an 


We will not provide a detailed proof. Apart from notation, it would 
parallel the proof of Stokes’ or Green’s theorems in ordinary vector calculus: 
The exterior derivative d has been defined so that the theorem holds for 
an infinitesimal square, cube, or hypercube. We therefore divide 2 into 
many such small regions. We then observe that the contributions of the 
interior boundary faces cancel because all interior faces are shared between 
two adjacent regions, and so occur twice with opposite orientations. Only 
the contribution of the outer boundary remains. 

Example: If 0 is a region of R?, then from 


1 
d te dy—y te) =. dtdy; 
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l-cos 8, 


Figure 12.6: Sphere and circumscribed cylinder. 


we have 


1 
Area(9) = f drdy=5 f (x dy — ydz). 
Q 2 Jan 


Example: Again, if 0 is a region of R*, then from d[r?d6/2] = r drd6 we have 


‘A 
Area (Q2) = [ rarao = >| r?do. 
) 2 Jan 


Example: If Q is the interior of a sphere of radius R, then 


4 
7 dadydz = | a dydz = —7R?. 
Q an 3 


Here we have referred back to (12.24) to evaluate the surface integral. 
Example: Archimedes’ tombstone. 

Archimedes of Syracuse gave instructions that his tombstone should have 
displayed on it a diagram consisting of a sphere and circumscribed cylinder. 
Cicero, while serving as quastor in Sicily, had the stone restored.? Cicero 
himself suggested that this act was the only significant contribution by a 
Roman to the history of pure mathematics. The carving on the stone was to 
commemorate Archimedes’ results about the areas and volumes of spheres, 


?Marcus Tullius Cicero, Tusculan Disputations, Book V, Sections 64 — 66 
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including the one illustrated in figure 12.6, that the area of the spherical cap 
cut off by slicing through the cylinder is equal to the area cut off on the 
cylinder. 

We can understand this result via Stokes’ theorem: If the two-sphere 5S? 
is parametrized by spherical polar co-ordinates 6, ¢, and 2 is a region on the 
sphere, then 


Area (Q) = i sin#d6d¢é = | (1-—cos@)dé, 
9) 


0Q 


and applying this to the figure, where the cap is defined by 0 < 0 gives 
Area (cap) = 27(1 — cos 69) 
which is indeed the area cut off on the cylinder. 


Exercise 12.2: The sphere S” can be thought of as the locus of points in R"*! 
obeying as (a’)? = 1. Use its invariance under orthogonal transformations 
to show that the element of surface “volume” of the n-sphere can be written 
as 


1 
d(Volume on S$”) = €aza2...0n410°" dx°? ...dxO"*!, 
n! 


Use Stokes’ theorem to relate the integral of this form over the surface of the 
sphere to the volume of the solid unit sphere. Confirm that we get the correct 
proportionality between the volume of the solid unit sphere and the volume 
or area of its surface. 


12.4 Applications 


We now know how to integrate forms. What sort of forms should we seek 
to integrate? For a physicist working with a classical or quantum field, a 
plentiful supply of intesting forms is obtained by using the field to pull back 
geometric objects. 


12.4.1 Pull-backs and push-forwards 


If we have a map ¢ from a manifold M to another manifold N, and we choose 
a point x € M, we can push forward a vector from T’'M, to T’Ngq), in the 
obvious way (map head-to-head and tail-to-tail). This map is denoted by 
but PMy = TN 6(q): 
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(x+X) 


ox 


x+X > 
\ 


(x) 


Figure 12.7: Pushing forward a vector X from TM, to TNgz). 


If the vector X has components X“ and the map takes the point with co- 
ordinates x” to one with coordinates €“(x), the vector ¢,X has components 


_ Of 
Ox” 
This looks like the transformation formula for contravariant vector compo- 
nents under a change of coordinate system. What we are doing here is 
conceptually different, however. A change of co-ordinates produces a passive 
transformation — 7.e. a new description for an unchanging vector. A push 
forward is an active transformation — we are changing a vector into differ- 
ent one. Furthermore, the map from M — N is not being assumed to be 
one-to-one, so, contrary to the requirement imposed on a co-ordinate trans- 
formation, it may not be possible to invert the functions €“(x) and write the 
x”’s as functions of the €"’s. 

While we can push forward individual vectors, we cannot always push 
forward a vector field X from TM to TN. If two distinct points x7, and 2, 
should happen to map to the same point € € N, and X(x,) 4 X(x2), we 
would not know whether to chose ¢,[X(a1)] or «|X (x2)] as [¢,X](€). This 
problem does not occur for differential forms. A map @: M — N induces a 
natural, and always well-defined, pull-back map ¢* : \? (T*N) — A? (T*M) 
which works as follows: Given a form w € /\? (T*N), we define ¢*w as a form 
on M by specifying what we get when we plug the vectors X;, Xo,...,X, € 
TM into it. We evaluate the form at x € M by pushing the vectors X;(z) 
forward from TM, to TNgz), plugging them into w at @(x) and declaring 
the result to be the evaluation of ¢*w on the X; at 2. Symbolically 


(p.X)" 


ee (12.29) 


[o*w] (Xi, Xo, sare , Xp) = w(d.X1, b4X2, ise iy GsXy): (12.30) 
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This may seem rather abstract, but the idea is in practice quite simple: 
If the map takes « € M — €(x) € N, and if 


w= Tig EAE dé, (12.31) 
then 
1 ; ; 
pw = pl iiate [E (a) |d&" (a) dé"? (x) -- - dé (x) 
ag" age ate 


... dye ...dax#?, (12.32) 


as pl itiaede [E@ Ort Orbe Ort 


Computationally, the process of pulling back a form is so transparent that 
it easy to confuse it with a simple change of variable. That it is not the same 
operation will become clear in the next few sections where we consider maps 
that are many-to-one. 


Exercise 12.3: Show that the operation of taking an exterior derivative com- 
mutes with a pull back: 


d[¢*w] = 9° (dw). 


Exercise 12.4: If the map ¢: M — N is invertible then we may push forward 
a vector field X on M to get a vector field ¢,X on N. Show that 


Lx[¢*w] = & [Lo,.xu]. 


Exercise 12.5: Again assume that ¢: M — N is invertible. By using the co- 
ordinate expressions for the Lie bracket along with the effect of a push-forward, 
show that if X, Y are vector fields on TM then 


bx([X,¥]) = [6eX, 62 ¥], 


as vector fields on TN. 


12.4.2 Spin textures 


As an application of pull-backs we consider some of the topological aspects 
of spin textures which are fields of unit vectors n, or “spins”, in two or three 
dimensions. 
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Consider a smooth map y : R? — S$? that assigns x +> n(x), where n is a 
three-dimensional unit vector whose tip defines a point on the 2-sphere S?. 
A physical example of such an n(x) would be the local direction of the spin 
polarization in a ferromagnetically-coupled two-dimensional electron gas. 

In terms of n, the area 2-form on the 2-sphere becomes 


1 1 fan bi 
Q= gn (dn x dn) = seignn'dn' dn (12.33) 
The y map pulls this area-form back to 


Fe2P0= 5 (eunn'Oyre On dada” = (€;;4n'O1n! gn") da‘dx? (12.34) 


which is a differential form in R?. We will call it the topological charge 
density. It measures the area on the two-sphere swept out by the n vectors 
as we explore a square in R? of side dx! by dz?. 

Suppose now that the n tends the same unit vector n(oo) at large distance 
in all directions. This allows us to think of “infinity” as a single point, and 
the assignment y : 7 + n(x) as a map from S$? to S?. Such maps are 
characterized topologically by their “topological charge,” or winding number 
N which counts the number of times the image of the originating x sphere 
wraps round the target n-sphere. A mathematician would call this number 
the Brouwer degree of the map y. It is intuitively plausible that a continuous 
map from a sphere to itself will wrap a whole number of times, and so we 
expect 


1 
i e— z/ {€,jxn'Oin! Oon* dx'dx", (12.35) 
Ar R2 


to be an integer. We will soon show that this is indeed so, but first we will 
demonstrate that N is a topological invariant. 

In two dimensions the form F' = y*Q is automatically closed because 
the exterior derivative of any two-form is zero — there being no three-forms 
in two dimensions. Even if we consider a field n(z',...,2™) in m > 2 
dimensions, however, we still have dF = 0. This is because 


din = 507 Oon' Ayn? Onda? dee! da. (12.36) 


If we plug infinitesimal vectors into the dx" to get their components dx", 
we have to evaluate the triple-product of three vectors dn’ = 0,,n'dx", each 


476 CHAPTER 12. INTEGRATION ON MANIFOLDS 


of which is tangent to the two-sphere. But the tangent space of S? is two- 
dimensional, so any three tangent vectors t1, tz, t3, are linearly dependent 
and their triple-product t; - (tz x t3) is therefore zero. 

Although it is closed, F = y*Q will not generally be the d of a globally 
defined one-form. Suppose, however, that we vary the map so that n(x) > 
n(z) + dn(xz). The corresponding change in the topological charge density is 


OF = y*|n- (d(én) x dn)], (12.37) 
and this variation can be written as a total derivative: 
OF = d{yp*[n- (dn x dn)|} = d{eijqn'dnId,n* dx" }. (12.38) 


In these manipulations we have used én: (dn x dn) = dn- (én x dn) = 0, the 
triple-products being zero for the linear-dependence reason adduced earlier. 
From Stokes’ theorem, we have 


ON = op= f exjan'dn) nk da". (12.39) 
82 as? 


Because the sphere has no boundary, i.e. 0.57 = 0, the last integral vanishes, 
so we conclude that 6N = 0 under any smooth deformation of the map 
n(x). This is what we mean when we say that N is a topological invariant. 
Equivalently, on R?, with n constant at infinity, we have 


6N = 6F = [esnn’smayn'ae (12.40) 
R2 r 


where I is a curve surrounding the origin at large distance. Again dN = 0, 
this time because 0,,n* = 0 everywhere on I’. 

In some physical applications, the field n winds in localized regions called 
skyrmions. These knots in the spin field behave very much as elementary 
particles, retaining their identity as they move through the system. The 
winding number counts how many skyrmions (minus the number of anti- 
skyrmions, which wind with opposite orientation) there are. To construct 
a smooth multi-skyrmion map y : R? — S$? with positive winding number 
N, take a set of N + 1 complex numbers A, a ),...,a@y and another set of N 
complex numbers 6;,..., 6, such that no b coincides with any a. Then put 


e tam S _ (274)... (2 — an) 


(z—6;)...(2—by) nn 
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where z = x! + ix”, and 6 and ¢ are spherical polar co-ordinates specifying 
the direction n at the point (x!,x?). At the points z = a; the vector n points 
straight up, and at the points z = 0; it points straight down. You will show in 
exercise 12.12 that this particular n-field configuration minimizes the energy 
functional 


1 
Eln| = 5 | (min + dm dyn) dx'dx” 


1 
= 5 f (iat? + [on2P + [vn°P) dx‘ dx? (12.42) 


for the given winding number N. In the next section we will explain the 
geometric origin of the mysterious combination e’® tan 0/2. 


12.4.3 The Hopf map 


You may recall that in section 10.2.3 we defined complex projective space 
CP" to be the set of rays in a complex n + 1 dimensional vector space. 


A ray is an equivalence classes of vectors [¢1, G2,.--;Gn4i], where the ¢; are 
not all zero, and where we do not distinguish between [¢1, ¢2,...,Gn41] and 
[AGi, AGa,---;AGn4i] for non-zero complex ». The space of rays is a 2n- 
dimensional real manifold: in a region where ¢,1; does not vanish, we can 
take as co-ordinates the real numbers &,...,&n, 1,---;"% Where 
1 a wm = ’ fo i Wo = Des 56n ae Mn = : (12.43) 
Cn+1 Catt Gi 


Similar co-ordinate charts can be constructed in the regions where other ¢; 
are non-zero. Every point in CP” lies in at least one of these co-ordinate 
charts, and the co-ordinate transformation rules for going from one chart to 
another are smooth. 

The simplest complex projective space, CP!, is the real two-sphere S? in 
disguise. This rather non-obvious fact is revealed by the use of a stereographic 
map to make the equivalence class [¢,, ¢2] € CP! correspond to a point n on 
the sphere. When ¢, is non zero, the class [¢j, ¢2] is uniquely determined by 
the ratio ¢2/¢, = |G2/GiJe’®, which we plot on the complex plane. We think 
of this copy of C as being the x, y plane in R®. We then draw a straight line 
connecting the plotted point to the south pole of a unit sphere circumscribed 
about the origin in R°. The point where this line (continued, if necessary) 
intersects the sphere is the tip of the unit vector n. 
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z 


qs 


Figure 12.8: Two views of the stereographic map between the two-sphere 
and the complex plane. The point ¢ = ¢2/¢, € C corresponds to the unit 
vector n € S$”. 


If C. were zero, n would end up at the north pole, where the R® co-ordinate 
z takes the value z = 1. If ¢; goes to zero with ¢ fixed, n moves smoothly to 
the south pole z = —1. We therefore extend the definition of our map to the 
case ¢; = 0 by making the equivalence class [0,¢2] correspond to the south 
pole. We can find an explicit formula for this map. Figure 12.8 shows that 
(o/¢, = e’® tan 6/2, and this relation suggests the use of the “t”-substitution 
formule: 


2t 1-?? 
siné = Toe? cos 6 = Tae? (12.44) 

where ¢t = tan 0/2. Since the x,y, z components of n are given by 

n' = sin@cosd, 

n? = sinOsind, 

n> = cos, (12.45) 
we find that 

2 1- . 


=telGiar ~~ ~ be iGiae 


We can multiply through by |¢,|? = ¢,¢,, and so write this correspondence 


in a more symmetrical manner: 


ae = €10o + Cott 
Gul? Gal?” 
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oe + (Bares) 
i IGP + 1G)? 7’ 
3 _ lGP-IoP 
(iP? + [¢2?” oe 


This last form can be conveniently expressed in terms of the Pauli sigma 
matrices 


2 0 1 a Oy 3 2. he 0 
a= (4 ae m= (f J = (4 ook (12.48) 


n= eusi(2 2)(2) 

aoe cme Oa 

a mza(! 1) CF (12.49) 
where 


(2) eee eee | (12.50) 
a) VIGE+IGP \& 
is a normalized 2-vector, which we think of as a spinor. 

The correspondence CP! ~ S? now has a quantum-mechanical interpre- 
tation: Any unit three-vector n can be obtained as the expectation value 
of the 6 matrices in a normalized spinor state. Conversely, any normalized 
spinor q = (21, 22)" gives rise to a unit vector via 


n' = plete. 2351) 


Now, since 
1S la + esl’, (12.52) 


the normalized spinor can be thought of as defining a point in S°. This 
means that the one-to-one correspondence [2zj, Z2] + n also gives rise to a 
map from S? — $?. This is called the Hopf map: 


Hopf «3S? 4S. (1953) 
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The dimension reduces from three to two, so the Hopf map cannot be one-to- 
one. Even after we have normalized [¢1, 2], we are still left with a choice of 
overall phase. Both (21, z2) and (ze, ze"), although distinct points in S%, 
correspond to the same point in CP!, and hence in S?. The inverse image of 
a point in S? is a geodesic circle in $°. Later, we will show that any two such 
geodesic circles are linked, and this makes the Hopf map topologically non- 
trivial, in that it cannot be continuously deformed to a constant map—7.e. to 
a map that takes all of S* to a single point in S?. 


Exercise 12.6: We have seen that the stereographic map relates the point with 
spherical polar co-ordinates 9, ¢ to the complex number 


¢ = e'® tan 6/2. 


We can therefore set ¢ = € + in and take €,7 as stereographic co-ordinates on 
the sphere. Show that in these co-ordinates the sphere metric is given by 


Oo-s ) dé @ dO + sin?6 do @ do 


4 
(1+ € + |nl?)? 


(d€ ® dE + dn ® dn), 


and that the area 2-form becomes 


Q = sindddAdd 


21 — 
Sa ne 
a+iepp 

4 


12.4.4 Homotopy and the Hopf map 


We can use the Hopf map to factor the map y : x +> n(x) via the three- 
sphere by specifying the spinor w at each point, instead of the vector n, and 
so mapping indirectly 


p: R2 % 93 gag iS, 


It might seem that for a given spin-field n(z) we can choose the overall phase 
of w(x) = (2%(2), z2(x))? as we like, however, if we demand that the z;’s 
be continuous functions of « then there is a rather non-obvious topological 
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restriction which has important physical consequences. To see how this comes 
about, we first express the winding number in terms of the z;. We find (after 
a page or two of algebra) that 


2 
F= (€:;nn'OynIOgn*) dx'dx? = 2 > (O1 2,022; = 0220) 2) dx'dx?, (12.55) 
a 


i=l 


and so the topological charge N is given by 


= fd (O,2;022; — 02;0,2;) dx* da”. (12.56) 


~ Oni 


Now, when written in terms of the z; variables, the form F’ becomes a total 
derivative: 


2 
2 
Fo= a Ss" (O12 022; ios O2Z012i) dx‘ dx? 


i=1 


z ft > (ZiOu7 — (On s)ayaer| | (12.57) 


a i=1 
Furthermore, because n is fixed at large distance, we have (21, 22) = e"?(c1, c2) 
near infinity, where c;,¢c2 are constants with |c;|? + |c9/? = 1. Thus, near 
infinity, 


1 2 


21 
i=1 


(Z;0,.24 — (Ou2)%:) > (ler|? + |e2|?)dd = dé. (12.58) 


We combine this observation with Stokes’ theorem to obtain 


Naxos > (2:0n2s = (On) 2) da" = = | ab. (12.59) 


Here, as in the previous section, [ is a curve surrounding the origin at large 
distance. Now f{ dé is the total change in 6 as we circle the boundary. While 
the phase e”’ has to return to its original value after a round trip, the angle 
@ can increase by an integer multiple of 27. The winding number $ d0/27 
can therefore be non-zero, but must be an integer. 
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We have uncovered the rather surprising fact that the topological charge 
of the map y : S? — S? is equal to the winding number of the phase angle 
@ at infinity. This is the topological restriction referred to in the preced- 
ing paragraph. As a byproduct, we have confirmed our conjecture that the 
topological charge N is an integer. The existence of this integer invariant 
shows that the smooth maps y : S? — S$? fall into distinct homotopy classes 
labelled by N. Maps with different values of N cannot be continuously de- 
formed into one another, and, while we have not shown that it is so, two 
maps with the same value of N can be deformed into each other. 

Maps that can be continuously deformed one into the other are said to 
be homotopic. The set of homotopy classes of the maps of the n-sphere into 
a manifold M is denoted by 7,(M/). In the present case M = S?. We are 
therefore claiming that 

m(S*) = Z, (12.60) 


where we are identifying the homotopy class with its winding number N € Z. 


12.4.5 The Hopf index 


We have so far discussed maps from $? to $?. It is perhaps not too surprising 
that such maps are classified by a winding number. What is rather more 
surprising is that maps y : S* — S$? also have an associated topological 
number. If we continue to assume that n tends to a constant vector at 
infinity so that we can think of R?U {oo} as being $°, this number will label 
the homotopy classes 73(.97) of fields of unit vectors n in three dimensions. 
We will think of the third dimension as being time. In this situation an 
interesting set of n fields to consider are the n(z,t) corresponding moving 
skyrmions. The world lines of these skyrmions will be tubes outside of which 
n is constant, and such that on any slice through the tube, n will cover the 
target n-sphere once. 

To motivate the formula we will find for the topological number, we begin 
with a problem from magnetostatics. Suppose we are given a cable originally 
made up of a bundle of many parallel wires. The cable is then twisted N 
times about its axis and bent into a closed loop, the end of each individual 
wire being attached to its begining to make a continuous circuit. A current I 
flows in the cable in such a manner that each individual wire carries only an 
infinitesimal part OJ; of the total. The sense of the current is such that as we 
flow with it around the cable each wire wraps N times anticlockwise about 
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Figure 12.9: A twisted cable with N = 5. 


all the others. The current produces a magnetic field B. Can we determine 
the integer twisting number N knowing only this B field? 
The answer is yes. We use Ampére’s law in integral form, 


{B - dr = (current encircled by I). (12.61) 
r 


We also observe that the current density V x B = J at a point is directed 
along the tangent to the wire passing through that point. We therefore 
integrate along each individual wire as it encircles the others, and sum over 
the wires to find 


Nie = » sh $B-dr= {B-Ide= [B-(V xB) d's (12.62) 


wires 2 


We now apply this insight to our three-dimensional field of unit vectors n(x). 
The quantity playing the role of the current density J is the topological cur- 


rent 1 
Jv = so eigan'O,n' On". (12.63) 


Observe that div J = 0. This is simply another way of saying that the 2-form 
F = ¢*Q is closed. 
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The flux of J through a surface S' is 


r= fa-as= fF (12.64) 


and this is the area of the spherical surface covered by the n’s. A skyrmion, 
for example, has total topological current J = 47, the total surface area of 
the 2-sphere. The skyrmion world-line will play the role of the cable, and 
the inverse images in R® of points on S? correspond to the individual wires. 

If form language, the field corresponding to B can be any one-form A 
such that dA = F’. Thus 


1 1 
Nuo B-Jd° AF 12.65 
Hope 72 cf, Lae ~ 1672 a ( ) 


will be an integer. This integer is the Hopf linking number, or Hopf index, 
and counts the number of times the skyrmion twists before it bites its tail to 
form a closed-loop world-line. 

There is another way of obtaining this formula, and of understanding the 
number 1677. We observe that the two-form F' and the one-form A are the 
pull-back from $° to R? along w of the forms 


1 2 
- S- (dz,dz; = dz;,dz;) ‘i 


i=l 


F 


2 


1 
=— = y Zaz — %A2i), 12. 
A i 2 (Z;dz; — 2;,dz;) (12.66) 


respectively. If we substitute 2,2 = 1,2 + 71,2, we find that 
AF = 8(€:dn d&2dn2 = md&,dEgdnp2 + fodngd&\dny a nod&od&\dn1). (12.67) 


We know from exercise 12.2 that this expression is eight times the volume 
3-form on the three-sphere. Now the total volume of the unit three-sphere is 
2x7, and so, from our factored map 7 + w+ n we have that 


Nyuopt = 


* 8 1 * 3 
Ten? Pag (AF) = sa fi. d(Volume on S$”) (12.68) 


is the number of times the normalized spinor q(x) covers S? as x covers R?, 
For the Hopf map itself, this number is unity, and so the loop in S® that 
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is the inverse image of a point in S? will twist once around any other such 
inverse image loop. 
We have now established that 


13(S7) = Z. (12.69) 


This result, implying that there are many maps from the three-sphere to 
the two-sphere that are not smoothly deformable to a constant map, was an 
great surprise when Hopf discovered it. 

One of the principal physics consequences of the existence of the Hopf 
index is that “quantum lump” quasi-particles such as the skyrmion can 
be fermions, even though they are described by commuting (and therefore 
bosonic) fields. 

To understand how this can be, we first explain that the collection of 
homotopy classes 7,,(1/) is not just a set. It has the additional structure 
of being a group: we can compose two homotopy classes to get a third, the 
composition is associative, and each homotopy class has an inverse. To define 
the group composition law, we think of S” as the interior of an n-dimensional 
cube with the map f : S” — M taking a fixed value mg € M at all points 
on the boundary of the cube. The boundary can then be considered to be a 
single point on S”. We then take one of the n dimensions as being “time” and 
place two cubes and their maps fi, f2 into contact, with f;, being “earlier” 
and f2 being “later.” We thus get a continuous map from a bigger box into 
M. The homotopy class of this map, after we relax the condition that the 
map takes the value mg on the common boundary, defines the composition 
[f2| ° [fi] of the two homotopy classes corresponding to f; and fo. The 
composition may be shown to be independent of the choice of representative 
functions in the two classes. The inverse of a homotopy class [f] is obtained 
by reversing the direction of “time” for each of the maps in the class. This 
group structure appears to depend on the fixed point mg. As long as M 
is arcwise connected, however, the groups obtained from different mp’s are 
isomorphic, or equivalent. In the case of 72(S”) = Z and 73($%) = Z, the 
composition law is simply the addition of the integers N € Z that label the 
classes. A useful exposition of homotopy theory for physicists is to be found 
in a review article by David Mermin.® 

When we quantize using Feynman’s “sum over histories” path integral, 
we have the option of multiplying the contributions of histories f that are 


3N. D. Mermin, “The topological theory of defects in ordered media.” Rev. Mod. Phys. 
51 (1979) 591. 
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not deformable into one another by distinct phase factors exp{id([f])}. The 
choice of phases must, however, be compatible with the composition of histo- 
ries by concatenating one after the other — the same operation as composing 
homotopy classes. This means that the product exp{i¢([f1|))} exp{io([fa]) } 
of the phase factors for two possible histories must be the phase factor 
exp{i([ fo] o [fi])} assigned to the composition of their homotopy classes. 
If our quantum system consists of spins n in two space and one time dimen- 
sion we can consistently assign a phase factor exp(iaNuyopr) to a history. The 
rotation of a single skyrmion twists the world-line cable through 27 and so 
makes Nyops = 1. The rotation therefore causes the wavefunction to change 
sign. We will show in the next section, that a history where two particles 
change places can be continuously deformed into a history where they do not 
interchange, but instead one of them is twisted through 27. The wavefunc- 
tion of a pair of skyrmions therefore changes sign when they are interchanged. 
This means that the quantized skyrmion is a fermion. 


12.4.6 Twist and writhe 


Consider two oriented non-intersecting closed curves 7; and y2. Imagine that 
Yq carries a unit current in the direction of its orientation and so gives rise 
to a magnetic field. Ampére’s law then tells us that the number of times y; 
encircles 72 is 


Lk(1.72) = f Bri) «dr, 


- af f! (v1 = re) - (dry x dra) (12.70) 
», (ean . 


Here the second expression follows from the first by an application of the 
Biot-Savart law to compute the B field due the current. This expression also 
shows that Lk(71, 72), which is called the Gauss linking number, is symmetric 
under the interchange 7, <> 7 of the two curves. It changes sign, however, 
if one of the curves changes orientation, or if the pair of curves is reflected 
in a mirror. 

We can relate the Gauss linking number to the Brouwer degree of a map. 
Introduce parameters t,, tg with 0 < t,,t2 < 1 to label points on the two 
curves. The curves are closed, so r;(0) = r;(1), and similarly for ra. Let us 
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also define a unit vector 
(12.71) 


Then 


ry (t) = Titi) — Po(te) Or, Ore 
Lk = dt, dt 
(71572) wf, 72 Iri(ti) — ro(t2)|? ty) om Yo(t2)|? (Fe Oto tt m 


On On 
hee = fn (se Ot x) dt, dts. (12.72) 


is seen to be (minus) the winding number of the map 


n: [0,1] x [0,1] — S$? (12.73) 


of the 2-torus into the sphere. Our previous results on maps into the 2-sphere 
therefore confirm our Ampére-law intuition that Lk(y71, 72) is an integer. The 
linking number is also topological invariant, being unchanged under any de- 
formation of the curves that does not cause one to pass through the other. 


An important application of these ideas occurs in biology, where the 
curves are the two complementary strands of a closed loop of DNA. We can 
think of two such parallel curves as forming the edges of a ribbon {71,y2} of 
width ¢. Let use denote by 7 the curve r(t) running along the axis of the 
ribbon midway between y; and y2. The unit tangent to y at the point r(t) is 


(Gel (12.74) 


where, as usual, the dots denote differentiation with respect to t. We also 
introduce a unit vector u(t) that is perpendicular to t(t) and lies in the 
ribbon, pointing from ri(t) to ro(t). 
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t 


Figure 12.10: An oriented ribbon {7,, y2} showing the vectors t and u. 


We will assign a common value of the parameter ¢t to a point on y and the 
points nearest to r(t) on 7, and 7. Consequently 


rj(t) = r(t)— u(t) 
et) = #84 seult) (12.75) 
We can express U as 
u=wxXu (12.76) 


for some angular-velocity vector w(t). The quantity 


i] 


Tw = — 
20 4 


(w-t) dt (12.77) 
is called the Twist of the ribbon. It is not usually an integer, and is a 
property of the ribbon {71,72} itself, being independent of the choice of 
parameterization ¢. 

If we set ri(t) and ro(t) equal to the single axis curve r(t) in the integrand 
of (12.70), the resulting “self-linking” integral, or Writhe, 


def 1 (r(t1) — r(tz)) - (t(t1) x r(t2)) 
Wr & ap fg ae (12.78) 


remains convergent despite the factor of |r(t,) — r(t2)|? in the denominator. 


However, if we try to achieve this substitution by making the width of the 
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ribbon € tend to zero, we find that the vector n(t,,t:) abruptly reverses its 
direction as t; passes tz. In the limit of infinitesimal width this violent motion 
provides a delta-function contribution 


to the 2-sphere area swept out by n, and this contribution is invisible to the 
Writhe integral. The Writhe is a property only of the overall shape of the 
axis curve y, and is independent both of the ribbon that contains it, and of 
the choice of parameterization. The linking number, on the other hand, is 
independent of €, so the e — 0 limit of the linking-number integral is not the 
integral of the « — 0 limit of its integrand. Instead we have 


Lk(v1,%) = epee t J ef i 


(t,) — r(te)|° 
(12.80) 


This formula 
Lk = Tw + Wr (12.81) 


is known as the Calugareanu-White-Fuller relation, and is the basis for the 
claim, made in the previous section, that the worldline of an extended particle 
with an exchange (Wr = +1) can be deformed into a worldline with a 27 
rotation (Tw = +1) without changing the topologically invariant linking 
number. 


Figure 12.11: Cutting and reassembling the domain of integration in (12.83). 


By setting 
(12.82) 
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we can express the Writhe as 


1 n n 

(= x =) dt, dto, (12.83) 
but we must take care to recognize that this new n(tj,t2) is discontinuous 
across the line t = t; = ta. It is equal to t(t) for t, infinitesimally larger 
than ty, and equal to —t(t) when ¢, is infinitesimally smaller than ta. By 
cutting the square domain of integration and reassembling it into a rhom- 
boid, as shown in figure 12.11, we obtain a continuous integrand and see 
that the Writhe is (minus) the 2-sphere area (counted with multiplicies and 
divided by 477) of a region whose boundary is composed of two curves [, the 
tangent indicatriz, or tantrix, on which n = t(t), and its oppositely oriented 
antipodal counterpart I’ on which n = —t(t). 

The 2-sphere area (1) bounded by I is only determined by I up to the 
addition of integer multiples of 47. Taking note that the “wrong” orientation 
of the boundary I (see figure 12.11 again) compensates for the minus sign 
before the integral in (12.83), we have 


An Wr = 2Q(T) + 4an. (12.84) 


Wr = -—-— : 
. An ie 


Thus, 
1 
Wr = atl): mod 1. (12.85) 


We can do better than (12.85) once we realize that by allowing crossings we 
can continuously deform any closed curve into a perfect circle. Each self- 
crossing causes Lk and Wr (but not Tw which, being a local functional, does 
not care about crossings) to jump by +2. For a perfect circle Wr = 0 whilst 
(2 = 27. We therefore have an improved estimate of the additive integer that 
is left undetermined by I, and from it we obtain 


1 
Wr=1+ 3, mod 2. (12.86) 


This result is due to Brock Fuller.* 

We can use our ribbon language to describe conformational transitions in 
long molecules. The elastic energy of a closed rod (or DNA molecule) can be 
approximated by 


Sp i {ote t)?+ so} ds (12.87) 


4F. Brock Fuller, Proc. Natl. Acad. Sci. USA, 75 (1978) 3557 - 61. 
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Here we are parameterizing the curve by its arc-length s. The constant a is 
the torsional stiffness coefficient, @ is the flexural stiffness, and 


d*r(s) 
ds? 


K(s) = = a ; (12.88) 


is the local curvature. Suppose that our molecule has linking number n, 7.e 
it was twisted n times before the ends were joined together to make a loop. 


O)-3 


Figure 12.12: A molecule initially with Lk = 3, Tw = 3, Wr = 0 writhes to 
a new configuration with Lk = 3, Tw = 0, Wr = 3. 


When 2 > a the molecule will minimize its bending energy by forming a 
planar circle with Wr ~ 0 and Tw * n. If we increase a, or decrease (3, there 
will come a point at which the molecule will seek to save torsional energy at 
the expense of bending, and will suddenly writhe into a new configuration 
with Wr © n and Tw ®& 0. Such twist-to-writhe transformations will be 
familiar to anyone who has struggled to coil a garden hose or electric cable. 


12.5 Further exercises and problems 


Exercise 12.7: A two-form is expressed in Cartesian co-ordinates as 


1 
W = 3 (2dady + adydz + ydzdx) 


where r = \/a? + y? + 22. 
a) Evaluate dw for r # 0. 
b) Evaluate the integral 


o= fw, 
P 


over the infinite plane P = {—0oo < 4 < ~,-0~ <y<w,z=I}. 
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c) A sphere is embedded into R* by the map y, which takes the point 
(0,¢) € S? to the point (x,y,z) € R°, where 


= Reos¢sing, 
Rsin dsin 8, 
= Rcosé. 


Pull back w and find the 2-form y*w on the sphere. (Hint: The form 
y*w is both familiar and simple. If you end up with an intractable mess 
of trigonometric functions, you have made an algebraic error.) 

d) By exploiting the result of part c), or otherwise, evaluate the integral 


® =| W 
S?(R) 


where $*(R) is the surface of a two-sphere of radius R centered at the 
origin. 


The following four exercises all explore the same geometric facts relating to 
Stokes’ theorem and the area 2-form of a sphere, but in different physical 
settings. 


Exercise 12.8: A flywheel of moment of inertia J can rotate without friction 
about an axle whose direction is specified by a unit vector n. The flywheel and 
axle are initially stationary. The direction n of the axle is made to describe a 
simple closed curve y = 02 on the unit sphere, and is then left stationary. 


Figure 12.13: Flywheel 
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Show that once the axle has returned to rest in its initial direction, the flywheel 
has also returned to rest, but has rotated through an angle 9 = Area(Q) when 
compared with its initial orientation. The area of 2 is to be counted as positive 
if the path y surrounds it in a clockwise sense, and negative otherwise. Observe 
that the path y bounds two regions with opposite orientations. Taking into 
account the fact that we cannot define the rotation angle at intermediate 
steps, show that the area of either region can be used to compute 6, the 
results being physically indistinguishable. (Hint: Show that the component 
Lz=TI1 (aw + dcos 0) of the flywheel’s angular momentum along the axle is a 
constant of the motion.) 


Exercise 12.9: A ball of unit radius rolls without slipping on a table. The 
ball moves in such a way that the point in contact with table describes a 
closed path y = OQ on the ball. (The corresponding path on the table will not 
necessarily be closed.) Show that the final orientation of the ball will be such 
that it has rotated, when compared with its initial orientation, through an 
angle ¢ = Area(Q) about a vertical axis through its center. As in the previous 
problem, the area is counted positive if y encircles 2 in an anti-clockwise sense. 
(Hint: recall the no-slip rolling condition ¢ + cos @ = 0 from (11.29).) 


Exercise 12.10: Let a curve in R® be parameterized by its arc length s as r(s). 
Then the unit tangent to the curve is given by 


et d 
t(s) =i eS. 


The principal normal n(s) and the binormal b(s) to the curve are defined by 
the requirement that t = «n with the curvature K(s) positive, and that t, n 
and b= t x n form a right-handed orthonormal frame. 


n 


Figure 12.14: Serret-Frenet frames. 
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a) Show that there exists a scalar T(s), the torsion of the curve, such that 
t, n and b obey the Serret-Frenet relations 


t 0 «K O t 
n ={!-K O Ff n 
b 0 -r 0 b 


—— 


b) Any pair of mutually orthogonal unit vectors e;(s), e2(s) perpendicular 
to t and such that e; x e7 = t can serve as an orthonormal frame for 
vectors in the normal plane. A basis pair e,, e2 with the property 


€, -e€9 —€9-e; = 0 


is said to be parallel, or Fermi-Walker, transported along the curve. In 
other words, a parallel-transported 3-frame t, e1, e2 slides along the 
curve r(s) in such a way that the component of its angular velocity in 
the t direction is always zero. Show that the Serret-Frenet frame e; = n, 
e2 = bis not parallel transported, but instead rotates at angular velocity 
@ = with respect to a parallel-transported frame. 

c) Consider a finite segment of the curve such that the initial and final 
Serret-Frenet frames are parallel, and so t(s) defines a closed path y = 00 
on the unit sphere. Fill in the line-by-line justications for the following 
sequence of manipulations: 


[re 5 [ (bam b)ds 
Y 2 Sy 


5 | (ann: db) 
2 Jy 


1 
5 | (aban — dn db) (x) 


2 
1 

5 | {ao t)(t- am — (dn.- t)(t- db)} 
5 | {0b aty(ae-m — (n- dt) (dt - b)} 


-5 | ¢-(at x at 
= —Area(Q). 


(The line marked ‘x’ is the one that requires most thought. How can we 
define “b” and “n” in the interior of (2?) 

d) Conclude that a Fermi-Walker transported frame will have rotated through 
an angle 0 = Area((Q), compared to its initial orientation, by the time it 
reaches the end of the curve. 
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The plane of transversely polarized light propagating in a monomode optical 
fibre is Fermi-Walker transported, and this rotation can be studied experimen- 
tally.° 


Exercise 12.11: Foucault’s pendulum (in disguise). A particle of mass m is 
constrained by a pair of frictionless plates to move in a plane II that passes 
through the origin O. The particle is attracted to O by a force —x«r, and it 
therefore executes harmonic motion within II. The orientation of the plane, 
specified by a normal vector n, can be altered in such a way that II continues 
to pass through the centre of attraction O. 


a) 


cD 


Show that the constrained motion is described by the equation 
mit + Kr = X(t)n, 


and determine X(t) in terms of m, n and ¢. 
Initially the particle motion is given by 


r(t) = Acos(wt + ¢), 


Now assume that n changes direction slowly compared to the frequency 
w = \/k/m. Seek a solution in the form 


r(t) = A(t) cos(wt + ), 


and, show that A = —n(m- A). Deduce that |A| remains constant, 
and so A = w x A for some angular velocity vector w. Show that w is 
perpendicular to n. 

Show that the results of part b) imply that the direction of oscillation A 
is “parallel transported,” in the sense of the previous problem. Conclude 
that if n slowly describes a closed loop y = 02 on the unit sphere, 
then the direction of oscillation A ends up rotated through an angle 
6 = Area(Q). 


The next exercise introduces a clever trick for solving some of the non-linear 
partial differential equations of field theory. The class of equations to which 
it and its generalizations are applicable is rather restricted, but when they 
work they provide a complete multi-soliton solution. 


5A. Tomita, R. Y. Chao, Phys. Rev. Lett. 57 (1986) 937-940. 
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Problem 12.12: In this problem you will find the spin field n(x) that minimizes 
the energy functional 


1 
E{n] = 5 f ([Vn'? eI | |Vn?|?) dx dx 
R2 


for a given positive winding number JN. 


a) 


e) 


Use the results of exercise 12.6 to write the winding number JN, defined 
in (12.35), and the energy functional E/n] as 


4 
4nrN = lreseeea (O1€02n — 0102) da‘ dx, 


1 4 
E{n) = = | ———es ((01€)? + (00€)? + (O19)? + (O27)") dat dx? 
tal = 5 fae aae (Oe)? + (One)? + (On)? + (any?) detae?, 
where € and 77 are stereographic co-ordinates on S? specifying the direc- 
tion of the unit vector n. 

Deduce the inequality 


def 1 4 ; eat te Re 
E-—4ntN = = | pos (01 +10 +in)|" dx dx* > 0. 
flo + tonyce + in) 
Deduce that, for winding number N > 0, the minimum-energy solutions 
have energy F = 47N and are obtained by solving the first-order linear 
partial differential equation 


O O 
(sa tig] (€+%in) =0. 


Solve the partial differential equation in part c), and hence show that 
the minimal-energy solutions with winding number N > 0 are given by 


(z —a1)...(z—ay) 
“Goh aia 


where z = x!+ix?, and A, a1,...,ay and bj,...,by are arbitrary complex 
numbers — except that no a may coincide with any b. This is the solution 
that we displayed at the end of section 12.4.2. 

Repeat the analysis for N < 0. Show that the solutions are given in 
terms of rational functions of 7 = x! : 


ae 


— 1x". 


The idea of combining the energy functional and the topological charge into 
a single, manifestly positive, functional is due to Evgueny Bogomol’nyi. The 
resulting first-order linear equation is therefore called a Bogomolnyi equation. 
If we had tried to find a solution directly in terms of n, we would have ended 
up with a horribly non linear second-order partial differential equation. 
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,Z 


Figure 12.15: A slice through the embedding of two-dimensional Lobachevski 
space into three-dimensional Minkowski space, showing the sterographic 
parametrization of the embedded space by the Poincaré disc X* + Y? < R?. 


Exercise 12.13: Lobachevski space. The hyperbolic plane of Lobachevski ge- 
ometry can be realized by embedding the Z > R branch of the two-sheeted 
hyperboloid 72 — X? — Y? = R? into a Minkowski space with metric ds? = 
—dZ? + dX* + dY”. 


We can parametrize the emebedded surface by making an “imaginary radius” 
version of the stereographic map, in which the point P on the hyperboloid 
is labelled by the co-ordinates of the point Q on the X-Y plane (see figure 
12.15). 


i) Show that this embedding induces the metric 


4R* 
o( . )= rye yap tk 8 aX + dY wa), X?+Y? < R’, 


of the Poincaré disc model (see problem 1.7) on the hyperboloid. 
ii) Use the induced metric to show that the area of a disc of hyperbolic 
radius p is given by 


= Peo f PY 2 _ 
Area = 4rR*sinh (5) 27 R*(cosh(p/R) — 1), 


and so is only given by 7p? when p is small compared to the scale R of 
the hyperbolic space. It suffices to consider circles with their centres at 
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the origin. You will first need to show that the hyperbolic distance p 
from the center of the disc to a point at Euclidean distance r is 


R+r 
=Rl : 
e nin (F**) 


Exercise 12.14: Faraday’s “flux rule” for computing the electromotive force € 
in a circuit containing a thin moving wire is usually derived by the following 
manipulations: 


E = § (Bt vx B)-dr 
02 


a [ cmE-as- B- (v x dr) 
Q dQ. 
OB 

= =} ——.dS:— B:-(v x adr 
Q Ot ao 
d 

= 22) Bas 
af dS 


a) Show that if we parameterize the surface Q as x“(u,v,T), with u,v la 
belling points on , and 7 parametrizing the evolution of 2, then the 
corresponding manipulations in the covariant differential-form version of 
Maxwell’s equations lead to 


of r= for ivP=- f f 
dt Jo Q an an 


where V" = Ox"/Or and f = —iyF. 
b) Show that if we take 7 to be the proper time along the world-line of each 
element of 0, then V is the 4-velocity 


1 
ye = ————_(1,v), 
V1l—v? (by) 
and f = —iyF becomes the one-form corresponding to the Lorentz-force 


4-vector. 


It is not clear that the terms in this covariant form of Faraday’s law can be 
given any physical interpretation outside the low-velocity limit. When parts 
of OX have different velocities, the relation of the integrals to measurements 
made at fixed co-ordinate time requires thought.® 


°See E. Marx, Journal of the Franklin Institute, 300 (1975) 353-364. 
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The next pair of exercises explores some physics appearances of the contin- 
uum Hopf linking number (12.65). 


Exercise 12.15: The equations governing the motion of an incompressible in- 
viscid fluid are V - v = 0 and Euler’s equation 
DV def Ov 
Dt ‘Ot 
Recall that the operator 0/0t + v-V, here written as D/Dt, is called the 
convective derivative. 
a) Take the curl of Euler’s equation to show that if w = V xv is the vorticity 
then 


+(v-V)v =-VP. 


Dw dw 
DE = a ty: Vw =~ V)y. 


b) Combine Euler’s equation with part a) to show that 


Bew-v falter} 


c) Show that if Q is a volume moving with the fluid, and f is a scalar 


function, then 
Df 


d 


e) Conclude that when w is zero at infinity the helicity 


r= fv-(vxvjav= [v-wav 


is a constant of the motion. 


The helicity measures the Hopf linking number of the vortex lines. The dis- 
covery’ of its conservation launched the field of topological fluid dynamics. 


Exercise 12.16: Let B = V x A and E = —0A/0t — V¢ be the electric and 
magnetic fields in an incompressible and perfectly conducting fluid. In such a 
fluid, the co-moving electromotive force E+v xB must vanish everywhere. 


a) Use Maxwell’s equations to show that 


OA 

BE = vx (V x A)-— V4, 
OB 

OL = V x (v x B). 


7H. K. Moffatt, J. Fluid Mech. 35 (1969) 117. 
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b) From part a) show that the convective derivative of A - B is given by 


D 
= (A-B)=V-{B(A-v—9)}. 


c) By using the same reasoning as in the previous problem, and assuming 
that B is zero at infinity, conclude that Woltjer’s invariant, 


is a constant of the motion. 


This result shows that the Hopf linking number of the magnetic field lines is 
independent of time. It is an essential ingredient in the geodynamo theory of 
the Earth’s magnetic field. 


Chapter 13 


An Introduction to Differential 
Topology 


Topology is the study of the consequences of continuity. We all know that 
a continuous real function defined on a connected interval and positive at 
one point and negative at another must take the value zero at some point 
between. This fact seems obvious—although a course of real analysis will 
convince you of the need for a proof. A less obvious fact, but one that 
follows from the previous one, is that a continuous function defined on the 
unit circle must posses two diametrically opposite points at which it takes the 
same value. To see that this is so, consider f(0 +7) — f(@). This difference 
(if not initially zero, in which case there is nothing further to prove) changes 
sign as @ is advanced through 7, because the two terms exchange roles. It was 
therefore zero somewhere. This observation has practical application in daily 
life: Our local coffee shop contains four-legged tables that wobble because 
the floor is not level. They are round tables, however, and because they 
possess no misguided levelling screws all four legs have the same length. We 
are therefore guaranteed that by rotating the table about its center through 
an angle of less than 7/2 we will find a stable location. A ninety-degree 
rotation interchanges the pair of legs that are both on the ground with the 
pair that are rocking, and at the change-over point all four legs must be 
simultaneously on the ground. 

Similar effects with a practical significance for physics appear when we 
try to extend our vector and tensor calculus from a local region to an entire 
manifold. A smooth field of vectors tangent to the sphere S? will always 
possess a zero — i.e. a point at which the the vector field vanishes. On 
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the torus T?, however, we can construct a nowhere-zero vector field. This 
shows that the global topology of the manifold influences the way in which 
the tangent spaces are glued together to form the tangent bundle. To study 
this influence in a systematic manner we need first to understand how to 
characterize the global structure of a manifold, and then to see how this 
structure affects the mathematical and physical objects that live on it. 


13.1 Homeomorphism and diffeomorphism 


In the previous chapter we met with a number of topological invariants asso- 
ciated with mappings. These homotopy invariants were unaffected by contin- 
uous deformations of a map, and served to distinguish between topologically 
distinct mappings. Similarly, homology invariants help classify topologically 
distinct manifolds. The analogue of the winding number is the set of Betti 
numbers of a manifold. If two manifolds have different Betti numbers they 
are certainly distinct. Unfortunately, if two manifolds have the same Betti 
numbers, we cannot be sure that they are topologically identical. It is a Holy 
Grail of topology to find a complete set of invariants such that having them 
all coincide would be enough to say that two manifolds were topologically 
the same. 

In the previous paragraph we were deliberately vague in our use of the 
terms “distinct” and the “same”. Two topological spaces (spaces equipped 
with a definition of what is to be considered an open set) are regarded as be- 
ing the “same”, or homeomorphic, if there is a one-to-one, onto, continuous 
map between them whose inverse is also continuous. Manifolds come with the 
additional structure of differentiability: we may therefore talk of “smooth” 
maps, meaning that their expression in coordinates is infinitely (C') differ- 
entiable. We regard two manifolds as being the “same”, or diffeomorphic, if 
there is a one-to-one onto C'™® map between them whose inverse is also C'%. 
The distinction between homeomorphism and diffeomorphism sounds like a 
mere technical nicety, but it has consequences for physics. Edward Witten 
discovered! that there are 992 distinct 11-spheres. These are manifolds that 
are all homeomorphic to the 11-sphere, but diffeomorphically inequivalent. 
This fact is crucial for the cancellation of global gravitational anomalies in 
the Eg x Eg or SO(32) symmetric superstring theories. 


1K. Witten, Comm. Math. Phys. 117 (1986), 197. 
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Since we are interested in the consequences of topology for calculus, we 
shall restrict ourselves to the interpretation “same” = diffeomorphic. 


13.2 Cohomology 


Betti numbers arise in answer to what seems like a simple calculus problem: 
when can a vector field whose divergence vanishes be written as the curl of 
something? We shall see that the answer depends on the global structure of 
the space the field inhabits. 


13.2.1 Retractable spaces: converse of Poincaré lemma 


Poincaré’s lemma asserts that d? = 0. In traditional vector-calculus language 
this reduces to the statements curl (grad ¢) = 0 and div (curlw) = 0. We 
often assume that the converse is true: If curlv = 0, we expect that we can 
find a @ such that v = grad @, and if divv = 0 that we can find a w such 
that v = curlw. You know a formula for the first case: 


o(x) = Jy - dx, (13.1) 


but you probably do not know the corresponding formula for w. Using dif- 
ferential forms, and provided the space in which these forms live has suitable 
topological properties, it is straightforward to find a solution for the general 
problem: If w is closed, meaning that dw = 0, find y such that w = dy. 

The “suitable topological properties” referred to in the previous para- 
graph is that the space be retractable. Suppose that the closed form w is 
defined in a domain 2. We say that 2 is retractable to the point O if there 
exists a smooth map y; : 2. — Q which depends continuously on a parameter 
t € [0,1] and for which y\(x) = x and yo(x) = O. Applying this retraction 
map to the form, we will then have yiw = w and yjw = 0. Let us set 
yr(a") = x(t). Define n(x, t) to be the velocity-vector field that corresponds 
to the co-ordinate flow: 


— =7"(z,1t). (13-2) 
An easy exercise, using the interpretation of the Lie derivative in (11.41), 


shows that 


© (ete) = Ly(ote). (13.3) 
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We now use the infinitesimal homotopy relation and our assumption that 
dw = 0, and hence (from exercise 12.3) that d(yjw) = 0, to write 


Ly (piw) = (ind + din)(piw) = dlin(yrw)]. (13.4) 


Using this, we can integrate up with respect to ¢ to find 


1 
wW=ypiw— pow =d (| in(eiw)tt) (13.5) 
0 


Thus 4 
x= [ inleiuyat (13.6) 
0 


solves our problem. 

This magic formula for y makes use of nearly all the “calculus on mani- 
folds” concepts that we have introduced so far. The notation is so powerful 
that it has also suppressed nearly everything that a traditionally-educated 
physicist would find familiar. We will therefore unpack the symbols by means 
of a concrete example. Let us take 2 to be the whole of R°. This can be 
retracted to the origin via the map y;(x") = x(t) = ta". The velocity field 
whose flow gives 


Gli te) 
is n“(x,t) = a"/t. To verify this, compute 


dx" (t) 1 
dt 


so x(t) is indeed the solution to 


a = n"(2t)st) 


Now let us apply this retraction to w = Adydz + Bdzdx + Cdxdy with 


OA OB OC 
dw = (= af Oi ss =) dxdydz = 0. (13.7) 


The pull-back vy; gives 


piw = A(ta, ty, tz)d(ty)d(tz) + (two similar terms). (13.8) 
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The interior product with 


Sale age ee (13.9) 
Tt Or Voy Oz 
then gives 
inp,w = tA(ta, ty, tz)(ydz — z dy) + (two similar terms). (13.10) 


Finally we form the ordinary integral over t to get 


1 
[ ialetwyat 
0 


if Altestyta}tde (ydz — zdy) 


xX 


ai re B(tax, ty, tz)t | (zdx — xdz) 


+] f cteestytejeat] (ndy — yt (13.11) 


In this expression the integrals in the square brackets are just numerical 

coefficients, t.e., the “dt” is not part of the one-form. It is instructive, 

because not entirely trivial, to let “d” act on y and verify that the con- 

struction works. If we focus first on the term involving A, we find that 
dif, A (tax, ty, tz)t dt](ydz — zdy) can be grouped as 


dA OA OA 
2 —— ———e- 
| {wast (Faves: ae sz) } at] duce 


1 

A 

- | pos dt (adydz + ydzdx + zdxdy). (1312) 
0 


The first of these terms is equal to 


1 
/ “ {t? A(ta, ty,tz)} | dydz = A(x, y, x) dydz, (13:13) 
0 


which is part of w. The second term will combine with the terms involving 
B, C, to become 


dx | Oy A 


which is zero by our hypothesis. Putting togther the A, B, C terms does, 
therefore, reconstitute w. 


A B 
-f[ e (= 5 Bera 7 = =) dt (xdydz + ydzdx + zdxdy), (13.14) 
0 
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13.2.2 Obstructions to exactness 


The condition that Q be retractable plays an essential role in the converse to 
Poincaré’s lemma. In its absence, dw = 0 does not guarantee that there is an 
x such that w = dy. Consider, for example, a vector field v with curlv = 0 
in a two-dimensional annulus 2 = {Ro < |r| < R,}. In the annulus (a non- 
retractable space) the condition that curly = 0 does not prohibit $,v - dr 
being non zero for some closed path T encircling the central hole. When 
this line integral is non-zero then there can be no single-valued x such that 
v = grady. If there were such a y, then 


fy -dr = x(0) — x(0) =0. (13.15) 


A non-zero value for gov - dr therefore consititutes an obstruction to the 
existence of a @ such that v = grad x. 

Example: The sphere S$? is not retractable: any attempt to pull its points 
back to the north pole will necessarily tear a hole in the surface somewhere. 
Related to this fact is that whilst the area 2-form sin @d@d@ is closed, it 
cannot be written as the d of something. We can try to write 


sin 0déd¢ = d|(1 — cos 6)d¢}, (13.16) 


but the 1-form (1 — cos @)d@ is singular at the south pole, 0 = 7. We could 
try 
sin 0d0d¢ = d{(—1 — cos 0)d@], (13.17) 


but this is singular at the north pole, 6 = 0. There is no escape. We know 
that 


7 sin Od6d@ = An, (13.18) 
S2 


but if sin 9@d@d¢é = dy then Stokes theorem says that 


| sinddddo= | y=0 (13.19) 
S2 Os? 


because 0S? = (J. Again, a non-zero value for { w over some boundary-less 
region has provided an obstruction to finding an x such that w = dy. 
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13.2.3. De Rham cohomology 


We have seen that, sometimes, the condition dw = 0 allows us to find a x such 
that w = dy, and sometimes it does not. If the region in which we seek y is 
retractable, we can always construct it. If the region is not retractable there 
may be an obstruction to the existence of y. In order to describe the various 
possibilities we introduce the language of cohomology, or more precisely de 
Rham cohomology, named for the Swiss mathematician Georges de Rham 
who did the most to create it. 


For simplicity, suppose that we are working in a compact manifold MW 
without boundary. Let Q?(M) = A?(T*M) be the space of all smooth p-form 
fields. It is a vector space over R: we can add p-form fields and multiply them 
by real constants, but, as is the vector space C™®(M) of smooth functions on 
M, it is infinite dimensional. The subspace Z?(/) of closed forms — those 
with dw = 0 — is also an infinite-dimensional vector space, and the same 
is true of the space B?(M) of exact forms — those that can be written as 
w = dx for some globally defined (p — 1)-form x. Now consider the space 
H? = Z?/B?, which is the space of closed forms modulo exact forms. In this 
space we do not distinguish between two forms, w, and w 2 when there an y, 
such that w, = w2+dy. We say that w; and w, are cohomologous in H?(M), 
and write w; ~ wW2. We will use the symbol |w]| to denote the equivalence 
class of forms cohomologous to w. Now a miracle happens! For a compact 
manifold M, the space H?(M) is finite dimensional! It is called the p-th (de 
Rham) cohomology space of the manifold, and depends only on the global 
topology of M. In particular, it does not depend on any metric we may have 
chosen for M. 


Sometimes we write H4,(M,R) to make clear that we are dealing with de 
Rham cohomolgy, and that we are working with vector spaces over the real 
numbers. This is because there is also a valuable space Hi, (M,Z), where 
we only allow multiplication by integers. 


The cohomology space H‘,(/,R) codifies all potential obstructions to 
solving the problem of finding a (p — 1)-form y such that dy = w: we can 
find such a y if and only if w is cohomologous to zero in H},(M,R). If 
Hig(M,R) = {0}, which is the case if M is retractable, then all closed p- 
forms are cohomologous to zero. If Hi,(M,R) 4 {0}, then some closed 
p-forms w will not be cohomologous to zero. We can test whether w ~ 0 € 
H''p(M,R) by forming suitable integrals. 
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13.3. Homology 


The language of cohomology seems rather abstract. To understand its topo- 
logical origin it may be more intuitive to think about the spaces that are the 
cohomology spaces’ vector-space duals. These homology spaces are simple to 
understand pictorially. 

The basic idea is that, given a region 2, we can find its boundary 02. 
Inspection of a few simple cases will soon lead to the conclusion that the 
“boundary of a boundary” consists of nothing. In symbols, 0? = 0. The 
statement “0? = 0” is clearly analgous to “d? = 0,” and, pursuing the anal- 
ogy, we can construct a vector space of “regions” and define two “regions” 
as being homologous if they differ by the boundary of another “region.” 


13.3.1 Chains, cycles and boundaries 


We begin by making precise the vague notions of region and boundary. 


Simplicial complexes 


The set of all curves and surfaces in a manifold M is infinite dimensional, but 
the homology spaces are finite dimensional. Life would be much easier if we 
could use finite-dimensional spaces throughout. Mathematicians therefore 
do what any computationally-minded physicist would do: they approximate 
the smooth manifold by a discrete polygonal grid.? Were they interested in 
distances, they would necessarily use many small polygons so as to obtain 
a good approximation to the detailed shape of the manifold. The global 
topology, though, can often be captured by a rather coarse discretization. 
The result of this process is to reduce a complicated problem in differential 
geometry to one of simple algebra. The resulting theory is therefore known 
as algebraic topology. 

It turns out to be convenient to approximate the manifold by generalized 
triangles. We therefore dissect M into line segments (if one-dimensional), 


?This discrete approximation leads to what is known as simplicial homology. Simplicial 
homology is rather primitive and old fashioned, having been supplanted by singular ho- 
mology and the theory of CW complexes. The modern definitions are superior for proving 
theorems, but are less intuitive, and for smooth manifolds lead to the same conclusions as 
the simpler-to-describe simplicial theory. 
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Figure 13.1: Triangles, or 2-simplices, that are a) allowed, b) not allowed in 
a dissection. In b) the problem is that only parts of edges are in common. 


P . P 


Ba y 4 B 


a) b) 


Figure 13.2: A triangulation of the 2-torus. a) The torus as a rectangle 
with periodic boundary conditions: The two edges labled a will be glued 
togther point-by-point along the arrows when we reassemble the torus, and 
so are to be regarded as a single edge. The two sides labeled (3 will be glued 
similarly. b) The assembled torus: All four P’s are now in the same place, 
and correspond to a single point. 


triangles, (if two-dimensional), tetrahedra (if three-dimensional) or higher- 
dimensional p-simplices (singular: simplex). The rules for the dissection are: 
a) Every point must belong to at least one simplex. 
b) A point can belong to only a finite number of simplices. 
c) Two different simplices either have no points in common, or 
i) one is a face (or edge, or vertex) of the other, 
ii) the set of points in common is the whole of a shared face (or edge, 
or vertex). 
The collection of simplices composing the dissected space is called a stmplicial 
complex. We will denote it by S. 
We may not need many triangles to capture the global topology. For 
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Figure 13.3: A second triangulation of the 2-torus. 


example, figure 13.2 shows how a two-dimensional torus can be decomposed 
into two 2-simplices (triangles) bounded by three 1-simplices (edges) a, 3,7, 
and with only a single 0-simplex (vertex) P. Computations are easier to 
describe, however, if each simplex in the decomposition is uniquely specified 
by its vertices. For this, we usually need a slightly finer dissection. Figure 
13.3 shows a decomposition of the torus into 18 triangles, each of which is 
uniquely labeled by three points drawn from a set of nine vertices. In this 
figure vertices with identical labels are to be regarded as the same vertex, 
as are the corresponding sides of triangles. Thus, each of the edges P,P, 
P,P3, P3P,, at the top of the figure are to be glued point-by-point to the 
corresponding edges on the bottom of the figure; similarly along the sides. 
The resulting simplicial complex then has 27 edges. 

We may triangulate the sphere S$? as a tetrahedron with vertices P,, P2, 
P3, Py. This dissection has six edges: P,P:, P;P3, P,P1, P2P3, P2P,, P3P1, 
and four faces: P:P3P,, P,P3P 4, P,P2P, and P,P) P3. 


p-chains 


We assign to simplices an orientation defined by the order in which we write 
their defining vertices. The interchange of of any pair of vertices reverses the 
orientation, and we consider there to be a relative minus sign between oppo- 
sitely oriented but otherwise identical simplices: P:P,P3P, = —P, P:P3P4. 
We now construct abstract vector spaces C',(S,R) of p-chains which have 
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Figure 13.4: A tetrahedral triangulation of the 2-sphere. The circulating 
arrows on the faces indicate the choice of orientation P; P,P, and P»P3P,. 


oriented p-simplices as their basis vectors. The most general elements of 
C2(S,R), with S being the tetrahedral triangulation of the sphere S?, would 
be 


a1 P> P3Py, + a2 P,; P3P4 + a3P, PP, + a4P, P>P3, (13.20) 


where the coefficients a ,...,a4, are real numbers. We regard the distinct 
faces as being linearly independent basis elements for C'2(S,IR). The space is 
therefore four dimensional. If we had triangulated the sphere so that it had 
16 triangular faces, the space Cy would be 16 dimensional. 

Similarly, the general element of C,(.S,.R) would be 


b, P,P» + bo P, Ps + b3P; Py + b4P> Ps + bs Py P, + be P3P4, (13.21) 


and so C;(S,R) is a six-dimensional space spanned by the edges of the tetra- 
hedron. For Co(S,R) we have 


aP + CoP» + c3P3 + caP4, (13.22) 


and so Co(.5, R) is four dimensional, and spanned by the vertices. Our man- 
ifold comprises only the surface of the two-sphere, so there is no such thing 
as C3(S,R). 

The reason for making the field R explicit in these definitions is that we 
sometimes gain more information about the topology if we allow only integer 
coefficients. The space of such p-chains is then denoted by C,(S,Z). Be- 
cause a vector space requires that coefficients be drawn from a field, these 
objects are no longer vector spaces. They can be thought of as either mod- 
ules— “vector spaces” whose coefficient are drawn from a ring—or as additive 
abelian groups. 
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P4 


O) 


P» P3 


Figure 13.5: The oriented triangle P:P3P, has boundary P3P,+ P,P:+ P:P3. 


The boundary operator 


We now introduce a linear map 0, : Cp, — Ch-1, called the boundary operator. 
Its action on a p-simplex is 


pt+1 
Oya Pate Pinal Sell Piet ia (13.23) 


j=l 
where the “hat” indicates that P;, is to be omitted. The resulting (p — 1)- 
chain is called the boundary of the simplex. For example 
O2(P2P3Py) = P3Py — P,P, + P2P3, 
P3P,+ PP: + P2P3. (13.24) 


The boundary of a line segment is the difference of its endpoints 
O1 (P; P2) = Py» = Py. (13.25) 


Finally, for any point, 
Ool; = 0. (13.26) 


Because O, is defined to be a linear map, when it is applied to a p-chain 
C = 181 + A289 + +++ G,8p, Where the s; are p-simplices, we have 0,c = 
Op S1 + 20,82 aoe eee Ch Ops ye 

When we take the “O” of a chain of compatibly oriented simplices that 
together make up some region, the internal boundaries cancel in pairs, and 
the “boundary” of the chain really is the oriented geometric boundary of the 
region. For example, in figure 13.6 we find that 


O(P; PsP2+P2P;Pi+P3P1Ps+P,P3Ps) = PyP3+P3Pi+P1P)+P2P;, (13.27) 
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O XO 


5 


Ss 


P; P, 


Figure 13.6: Compatibly oriented simplices. 


which is the anti-clockwise directed boundary of the square. 

For each of the examples above, we find that 0,-10,s = 0. From the 
definition (13.23) we can easily establish that this identity holds for any p- 
simplex s. As chains are sums of simplices and 0, is linear, it remains true 
for any c € Cy. Thus 0,_10, = 0. We will usually abbreviate this statement 
as 0? = 0. 


Cycles, boundaries and homology 


A chain complex is a doubly-infinite sequence of spaces (these can be vector 
spaces, modules, abelian groups, or many other mathematical objects) such 
as ...,C_2, C_1, Co, Ci, Co..., together with structure-preserving maps 


ae Os Cg (13.28) 


possessing the property that 0,10, = 0. The finite sequence of C;,’s that we 
constructed from our simplicial complex is an example of a chain complex 
where C;,, is zero-dimensional for p < 0 or p > d. Chain complexes are a 
useful tool in mathematics, and the ideas that we explain in this section 
have many applications. 

Given any chain complex we can define two important linear subspaces 
of each of the C,’s. The first is the space Z, of p-cycles. This consists of 
those z € C, such that 0,z = 0. The second is the space B, of p-boundaries, 
and consists of those b € C, such that b = 0,+1¢ for some c € Ch41. Because 
0? = 0, the boundaries B, constitute a subspace of Z,. From these spaces 
we form the quotient space H, = Z,/B,, consisting of equivalence classes of 
p-cycles, where we deem z; and 2 to be equivalent, or homologous, if they 
differ by a boundary: z2 = z, + Oc. We write the equivalence class of cycles 
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homologous z; to as [z;]. The space H,, or more accurately, H,,(R), is called 
the p-th (simplicial) homology space of the chain complex. It becomes the 
p-th homology group if R is replaced by the integers. 

We can construct these homology spaces for any chain complex. When 
the chain complex is derived from a simplicial complex decomposition of a 
manifold M a remarkable thing happens. The spaces C,, Z,, and B,, all 
depend on the details of how the manifold M has been dissected to form 
the simplicial complex S. The homology space H,, however, is independent 
of the dissection. This is neither obvious nor easy to prove. We will rely 
on examples to make it plausible. Granted this independence, we will write 
H,(M), or H,(M,R), so as to make it clear that H, is a property of M. The 
dimension b, of H,(M) is called the p-th Betti number of the manifold: 


b, = dim H,(M). (13.29) 
Example: The Two-Sphere. For the tetrahedral dissection of the two-sphere, 
any vertex is P; homologous to any other, as P; — P; = O(P;P;) and all 
P;P, belong to Cy. Furthermore, 0P; = 0, so Ho(S*) is one dimensional. 
In general, the dimension of Ho(M) is the number of disconnected pieces 
making up M. We will write Ho(S”) = R, regarding R as the archetype of a 
one-dimensional vector space. 

Now let us consider H,(S7). We first find the space of l-cycles Z,;. An 
element of C; will be in Z, only if each vertex that is the begining of an edge 
is also the end of an edge, and that these edges have the same coefficient. 
Thus, 


2, = P)P3+ P3P,+ P,P 
is a cycle, as is 
zg = Pi Py + PyP, + PoP. 


These are both boundaries of faces of the tetrahedron. It should be fairly 
easy to convince yourself that Z, is the space of linear combinations of these 
together with boundaries of the other faces 


2 = Pit Pa t+ PP, 
P,P; + P3P y+ PP. 


24 


Any three of these are linearly independent, and so Z, is three dimensional. 
Because all of the cycles are boundaries, every element of Z, is homologous 
to 0, and so H,(S*) = {0}. 
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We also see that H2(S”) = R. Here the basis element is 


P)P3P, — Pi P3P,+ Pi P2Py — P, P2P3 (13.30) 


which is the 2-chain corresponding to the entire surface of the sphere. It 
would be the boundary of the solid tedrahedron, but does not count as a 
boundary because the interior of the tetrahedron is not part of the simplicial 
complex. 

Example: The Torus. Consider the 2-torus T?. We will see that Ho(T?) = R, 
H,(T?) = R? = ROR, and A,(T?) = R. A natural basis for the two- 
dimensional H,(T”) consists of the 1-cycles a, 3 portrayed in figure 13.7. 


a 
Figure 13.7: A basis of I-cycles on the 2-torus. 
The cycle y that, in figure 13.2, winds once around the torus is homologous 


to a+ 3. In terms of the second triangulation of the torus (figure 13.3) we 
would have 


2 
| 


P,P2+ P,P3+ P3Pi, 
PRL PP PP. (13.31) 


key) 
| 


and 


y = PiPgt+ PePo t+ PoP 
a+ (3+ 0(P,PgP2 + PgPoP2 + PoP 9P3+---). (13.32) 


Example: The Projective Plane. The projective plane RP? can be regarded 
as a rectangle with diametrically opposite points identified. Suppose we 
decompose RP? into eight triangles, as in figure 13.8. 
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Figure 13.8: A triangulation of the projective plane. 


Consider the “entire surface” 


o = P,P)Ps + P,PsPy +--+ € Co(RP?), (L338) 


consisting of the sum of all eight 2-simplices with the orientation indicated 
in the figure. Let a = P,P, + P)P3 and @ = P,P, + P,P; be the sides of the 
rectangle running along the bottom horizontal and left vertical sides of the 
figure, respectively. In each case they run from P; to P3. Then 


O(o) = P,Py+ P)P3+ P3Pi:+ PrP, + PjPp + PoP; + P3P, + PrP, 
2(a — 8) £0. (13.34) 


Although RP? has no actual edge that we can fall off, from the homological 
viewpoint it does have a boundary! This represents the conflict between local 
orientation of each of the 2-simplices and the global non-orientability of RP?. 
The surface o of RP? is not a two-cycle, therefore. Indeed Z(RP?), and a 
fortiori H2(RP7), contain only the zero vector. The only one-cycle is a — 3 
which runs from P, to P; via Py, P3; and Py, but (13.34) shows that this is 
the boundary of $0. Thus H2(RP?,R) = {0} and H,(RP?,R) = {0}, while 
H)(RP?,R) =R. 

We can now see the advantage of restricting ourselves to integer coeffi- 
cients. When we are not allowed fractions, the cycle y = (a — 3) is no longer 
a boundary, although 2(a— (3) is the boundary of a. Thus, using the symbol 
Zz to denote the additive group of the integers modulo two, we can write 
H,(RP?,Z) = Zy. This homology space is a set with only two members 
{0y, ly}. The finite group H,(RP?,Z) = Zz is said to be the torsion part 
of the homology — a confusing terminology because this torsion has nothing 
to do with the torsion tensor of Riemannian geometry. 
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We introduced real-number homology first, because the theory of vector 
spaces is simpler than that of modules, and more familiar to physicists. The 
torsion is, however, invisible to the real-number homology. We were therefore 
buying a simplification at the expense of throwing away information. 


The Euler character 


The sum r 
x(M) = S>(-1)? dim H,(M,R) (13.35) 


p=0 


is called the Euler character of the manifold MM. For example, the 2-sphere 
has y(S*) = 2, the projective plane has y(RP?) = 1, and the n-torus has 
x(T") = 0. This number is manifestly a topological invariant because the 
individual dim H,,(/,R) are. We will show that that the Euler character is 
also equal to V — E+ F' —--- where V is the number of vertices, F’ is the 
number of edges and F' is the number of faces in the simplicial dissection. The 
dots are for higher dimensional spaces, where the alternating sum continues 
with (—1)? times the number of p-simplices. In other words, we are claiming 
that 


d 
x(M) = $°(-1)? dim C,(M). (13.36) 
p=0 
It is not so obvious that this new sum is a topological invariant. The indi- 
vidual dimensions of the spaces of p-chains depend on the details of how we 
dissect M into simplices. If our claim is to be correct, the dependence must 
somehow drop out when we take the alternating sum. 

A useful tool for working with alternating sums of vector-space dimensions 
is provided by the notion of an exact sequence. We say that a set of vector 
spaces V, with maps f, : V, — Vp41 is an exact sequence if Ker(f,) = 
Im (f,-1). For example, if all cycles were boundaries then the set of spaces C,, 
with the maps 0, taking us from C;, to C,_; would consitute an exact sequence 
— albeit with p decreasing rather than increasing, but this is irrelevent. 
When the homology is non-zero, however, we only have Im (f,_1) C Ker (f,), 
and the number dim H, = dim (Ker f,) — dim (Im f,_-1) provides a measure 
of how far this set inclusion falls short of being an equality. 

Suppose that 


o} uy, ey &.. fory, 4 10} (13.37) 
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is a finite-length exact sequence. Here, {0} is the vector space containing 
only the zero vector. Being linear, fo maps 0 to 0. Also f, maps everything 
in V, to 0. Since this last map takes everything to zero, and what is mapped 
to zero is the image of the penultimate map, we have V,, = Im f,_1. Similarly, 
the fact that Ker f; = Im fo = {0} shows that Im f; C V2 is an isomorphic 
image of V,. This situation is represented pictorially in figure 13.9. 


Figure 13.9: A schematic representation of an exact sequence. 


Now the range-nullspace theorem tells us that 


dimV, = dim (Im f,) + dim (Ker f,) 
= dim (Im f,) + dim (Im f,_1). (13.38) 
When we take the alternating sum of the dimensions, and use dim (Im fo) = 0 
and dim (Im f,,) = 0, we find that the sum telescopes to give 


S"(-1)? dim V, = 0. (13.39) 
p=0 
The vanishing of this alternating sum is one of the principal properties of an 
exact sequence. 
Now, for our sequence of spaces C’, with the maps 0, : Cp, — Cp_1, we have 
dim (Ker 0,) = dim (Im 0,+1) + dim H,. Using this and the range-nullspace 
theorem in the same manner as above, shows that 


S °(-1)?dim C,(M) = 5°(-1)?dim H,(M). (13.40) 


This confirms our claim. 


13.3. HOMOLOGY 519 


Exercise 13.1: Count the number of vertices, edges and faces in the triangu- 
lation we used to compute the homology groups of the real projective plane 
RP?. Verify that V — E + F = 1, and that this is the same number that we 
get by evaluating 


x(RP?) = dim Ho(RP?,R) — dim H,(RP?,R) + dim H2(RP?,R). 
Exercise 13.2: Show that the sequence 
{0} -V &W — {0} 


of vector spaces being exact means that the map ¢: V — W is one-to-one 
and onto, and hence an isomorphism V = W. 


Exercise 13.3: Show that a short exact sequence 


{Oo} A iB aG > {0} 


of vector spaces is just a sophisticated way of asserting that C = B/A. More 
precisely, show that the map 7 is injective (one-to-one), so A can be considered 
to be a subspace of B. Then show that the map 7 is surjective (onto), and 
can be regarded as projecting B onto the equivalence classes B/A. 


Exercise 13.4: Let a: A— B be a linear map. Show that 
{0}—Kera + ASB Cokera > {0} 


is an exact sequence. (Recall that Cokera = B/Ima.) 


13.3.2 Relative homology 


Mathematicians have invented powerful tools for computing homology. In 
this section we introduce one of them: the exact sequence of a pair. We 
describe this tool in detail because a homotopy analogue of this exact se- 
quence is used in physics to classify defects such as dislocations, vortices and 
monopoles. Homotopy theory is however harder and requires more technical 
apparatus than homology, so the ideas are easier to explain here. 

We have seen that it is useful to think of complicated manifolds as being 
assembled out of simpler ones. We constructed the torus, for example, by 
gluing together edges of a rectangle. Another construction technique involves 
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shrinking parts of a manifold to a point. Think, for example, of the unit 2- 
disc as a being circle of cloth with a drawstring sewn into its boundary. Now 
pull the string tight to form a spherical bag. The continuous functions on 
the resulting 2-sphere are those continuous functions on the disc that took 
the same value at all points on its boundary. Recall that we used this idea in 
section 12.4.2, where we claimed that those spin textures in R? that point in 
a fixed direction at infinity can be thought of as spin textures on the 2-sphere. 
We now extend this shrinking trick to homology. 

Suppose that we have a chain complex consisting of spaces C’, and bound- 
ary operations O,. We denote this chain complex by (C,0). Another set of 
of spaces and boundary operations (C’, 0’) is a subcomplex of (CO) if each 
Ci, © C, and 0;(c) = 0,(c) for each c € C),. This situation arises if we have a 
simplical complex S and a some subset S’ that is itself a simplicial complex, 
and take Ci, = C,(S’) 

Since each C%, is subspace of C, we can form the quotient spaces C,/C), 


P 
and make them into a chain complex by defining, for c+ C), € Cp/Ci, 


Ope + CL) = Ope + Ch_). (13.41) 


It easy to see that this operation is well defined (7.e. it gives the same output 
independent of the choice of representative in the equivalence class c + C4), 
that 0, : Cy — Cy-1 is a linear map, and that 0,10, = 0. We have 
constructed a new chain complex (C/C’,0). We can therefore form its ho- 
mology spaces in the usual way. The resulting vector space, or abelian group, 
H,(C/C’") is the p-th relative homology group of C modulo C’. When C’ and 
C arise from simplicial complexes S’ C S, these spaces are what remains of 
the homology of S after every chain in S’ has been shrunk to a point. In 
this case, it is customary to write H,(S,S’) instead of H,(C/C’), and simi- 
larly write the chain, cycle and boundary spaces as C;,(.5,S’), Z)(S, S$") and 
B,(S,S") respectively. 

Example: Constructing the two-sphere S* from the two-ball (or disc) B?. 
We regard B? to be the triangular simplex P,P:P3, and its boundary, the 
one-sphere or circle S', to be the simplicial complex containing the points P,, 
P,, P3; and the sides P,P:, P:P3, P3P,, but not the interior of the triangle. 
We wish to contract this boundary complex to a point, and form the relative 
chain complexes and their homology spaces. Of the spaces we quotient by, 
Co(S*') is spanned by the points P,, Po, P3, the 1-chain space C(S') is 
spanned by the sides P,P2, P2P3, P3P,, while C2(S') = {0}. The space of 
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relative chains C2(B', S') consists of multiples of P,P 2P3 + C2($1), and the 
boundary 


dy (PLPsP3 ze C2(S")) (BP) ie Pe ASP Psy tS") (13.42) 


is equivalent to zero because P2P3 + P3P, + P;P2 € C,(S1). Thus P, P2P3 + 
C2(S') is a non-bounding cycle and spans H2(B?,S*'), which is therefore 
one dimensional. This space is isomorphic to the one-dimensional H(9$7). 
Similarly H,(B?,S*) is zero dimensional, and so isomorphic to H,(S?). This 
is because all chains in C,(B?,S') are in C,($1) and therefore equivalent to 
Zero. 

A peculiarity, however, is that Ho(B?, S') is not isomorphic to H(S?) = 
R. Instead, we find that Hp(B?,.S1) = {0} because all the points are equiva- 
lent to zero. This vanishing is characteristic of the zeroth relative homology 
space Ho(S,.S”) for the simplicial triangulation of any connected manifold. 
It occurs because S being connected means that any point P in S can be 
reached by walking along edges from any other point, in particular from a 
point P’ in S’. This makes P homologous to P’, and so equivalent to to zero 
in H)(S, S$"). 


Exact homology sequence of a pair 


Homological algebra is full of miracles. Here we describe one of them. From 
the ingredients we have at hand, we can construct a semi-infinite sequence 
of spaces and linear maps between them 


SHS)“ HS) 8 H,(S,5!) 
Oxp—1 


Hypi(S') > Hpai(S) => Hy(S,5') 5 


ony Fo(S°y 2s Hy (S) BE eho (9.5'y 25 10} (139:43) 


The maps i, and 7, are induced by the natural injection 7, : C,(S’) + C,(S) 
and projection 7, : C,(S) > C,(S)/C,(S"). It is only necessary to check that 


Tp-10p = OpTp, 
lpg ~=- Opi ys (13.44) 
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to see that they are compatible with the passage from the chain spaces to 
the homology spaces. More discussion is required of the connection map Oxp 
that takes us from one row to the next in the displayed form of (13.43). 

The connection map is constructed as follows: Let h € H,(S,5"). Then 
h = z+ B,(S,5") for some cycle z € Z(S,5"), and in turn z = c+ C,(S’) 
for some c € C,(5'). (So two choices of representative of equivalence class 
are being made here.) Now 0,z = 0 which means that 0,c € C,_1(S’). This 
fact, when combined with 0,_10, = 0, tells us that 0,c € Z,_1(S"). We now 
define the 0,, image of h to be 


Onp(h) = Ope + By_1(S’). (13.45) 


This sounds rather involved, but let’s say it again in words: an element of 
H,(S,S") is a relative p-cycle modulo S'. This means that its boundary is 
not necessarily zero, but may be a non-zero element of Cy_1(S’). Since this 
element is the boundary of something its own boundary vanishes, so it is 
(p — 1)-cycle in C,_;(S") and hence a representative of a homology class in 
H,-1(5S"). This homology class is the output of the 0,,, map. 

The miracle is that the sequence of maps (13.43) is exact. It is an example 
of a standard homological algebra construction of a long exact sequence out 
of a family of short exact sequences, in this case out the sequences 


{0} — €,(5") > C,(S) > Cp(S,S") > {0}. (13.46) 


Proving that the long sequence is exact is straightforward. All one must do 
is check each map to see that it has the properties required. This exercise in 
what is called diagram chasing is left to the reader. 

The long exact sequence that we have constructed is called the exact 
homology sequence of a pair. If we know that certain homology spaces are 
zero dimensional, it provides a powerful tool for computing other spaces in 
the sequence. As an illustration, consider the sequence of the pair B"*! and 
S” for n > 0: 


H,(B™,5") 3 Hy 4(S") 


Me HBr) Suis 
——— 


= {0} 
tap Hy_1(B"*?) patdee Fit (Br Se) eet Fp 0") 
—_—_—_—_——_—_—_—_ 
= {0} 
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1, Hy (B"t) soa H, (Br, 8") ey Hy(8") 


Ont 
—_—— —— 
= {0} =R 
9, (Bt) 7, Hy(Brtt sn) 2 fo}. (13.47) 
R 


We have inserted here the easily-established data that H,(B"*') = {0} for 
p > 0 (which is a consequence of the (n + 1)-ball being a contractible space), 
and that Ho(B"*') and Ho(S”) are one dimensional because they consist of 
a single connected component. We read off, from the {0} — A— B— {0} 
exact subsequences, the isomorphisms 


AB Si HS"). oS te. (13.48) 


and from the exact sequence 


{0} — H,(B""'”, S*) -R —R- A) (B"", 8") — {0} (13.49) 


that H,(B"*!, 8”) = {0} = Ho(B"*!, 8”). The first of these equalities holds 
because H,(B"t',S”) is the kernel of the isomorphism R — R, and the 
second because Hp(B"t!, S”) is the range of a surjective null map. 

In the case n = 0, we have to modify our last conclusion because Ho(S°) = 
R @ R is two dimensional. (Remember that Ho(V/) counts the number of 
disconnected components of M, and the zero-sphere S° consists of the two 
disconnected points P,, Pz lying in the boundary of the interval B! = P; P>.) 
As a consequence, the last five maps in (13.47) become 


{0} — H,(B', S°) s+ R@R-R-— A,(B’,S°) — {O}. (13.50) 


This tells us that H,(B1,S°) = R and Ho(B', S°) = {0}. 


Exact homotopy sequence of a pair 


The construction of a long exact sequence from a short exact sequence is 
a very powerful technique. It has become almost ubiquitous in advanced 
mathematics. Here we briefly describe an application to homotopy theory. 
We have met the homotopy groups 7,(/) in section 12.4.4. As we saw 
there, homotopy groups can be used to classify defects or textures in physical 
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systems in which some field takes values in a manifold M. Suppose that the 
local physical properties of a system are invariant under the action of a Lie 
group G — for example the high temperature phase of ferromagnet may be 
invariant under rotation group SO(3)). Now suppose that system undergoes 
spontaneous symmetry breaking is becomes invariant only under a subgroup 
H. Then manifold manifold of inequivalent states is the coset G/H. For 
a ferromagnet the symmetry breaking will from G = SO(3) to H = SO(2) 
where SO(2) is the group of rotations about the axis of magnetization. G/H 
is then the 2-sphere of direction in which the magnetization can point. 

The group 7,(G) can be taken to be the set of continuous maps of an 
n-dimensional cube into the group G, with the surface of the cube mapping 
to the identity element e € G. We similarly define the relative homotopy 
group 7,(G,H) of G modulo H to be the set of continuous maps of the 
cube into G, with all-but-one face of the cube mapping to e, but with the 
remaining face mapping to the subgroup H. It can then be shown that 
T(G/H) = m,(G,H) (the hard part is to show that any continuous map 
into G/H can be represented as the projection of some continuous map into 
G). 


The short exact sequence 
fe} 3 HG >G/H = {e} (13.51) 


of group homomorphisms (where {e} is the group consisting only of the 
identity element) then gives rise to the long exact sequence 


-— Tn(H) > t(G)  t(G, H) — tm-1(H) > ---. (13.52) 


The derivation and utility of this exact sequence is very well described in 
the review article by Mermin cited in section 12.4.4. We have therefore 
contented ourselves with simply displaying the result so that the reader can 
see the similarity between the homology theorem and its homotopy-theory 
analogue. 


13.4 De Rham’s theorem 


We still have not related homology to cohomology. The link is provided by 
integration. 
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The integral provides a natural pairing of a p-chain c and a p-form w: if 
C = 0181 + A282 +--+++AnSn, where the s; are simplices, we set 


(ea) = py a; i, W. (13.53) 


The perhaps mysterious notion of “adding” geometric simplices is thus given 
a concrete interpretation in terms of adding real numbers. 
Stokes’ theorem now reads 


(Oc,w) = (c, dw), (13.54) 


suggesting that d and O should be regarded as adjoints of each other. From 
this observation follows the key fact that the pairing between chains and 
forms descends to a pairing between homology classes and cohomology classes. 
In other words, 

(z+ Oc,w + dx) = (z,w), (13.55) 


so it does not matter which representatives of the two equivalence classes we 
take when we compute the integral. Let us see why this is so. 
Suppose z € Z, and w2 = w; + dy. Then 


- forfo 
a ah 


= (a7) (13.56) 


because 0z = 0. Thus, all elements of the cohomology class of w return the 
same answer when integrated over a cycle. 
Similarly, if w € Z? and cp = c; + Oa then 


ee 
C1 Oa 
= fot fae 
iz 


(c1,w), 


(co, Ww) 
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since dw = 0. 

All this means that we can consider the equivalence classes of closed forms 
composing Hi, (/) to be elements of (H,(/))*, the dual space of H,(M) 
— hence the “co” in cohomology. The existence of the pairing does not 
automatically mean that H’, is the dual space to H,(M), however, because 
there might be elements of the dual space that are not in H4,, and there 
might be distinct elements of Hi, that give identical answers when integrated 
over any cycle, and so correspond to the same element in (H,(M))*. This 
does not happen, however, when the manifold is compact: De Rham showed 
that, for compact manifolds, (H,(/,R))* = H},(M,R). We will not try to 
prove this, but be satisfied with some examples. 

The statement (H,(M))* = Hip(M) neatly summarizes de Rham’s re- 
sults, but, in practice, the more explicit statements given below are more 
useful. 


Theorem: (de Rham) Suppose that M is a compact manifold. 
1) A closed p-form w is exact if and only if 


ic = 0 (13.57) 


for all cycles z; € Zp. It suffices to check this for one representative of 

each homology class. 

2) If z; € Z, 1 = 1,...,dim H,, is a basis for the p-th homology space, 
and a; a set of numbers, one for each z;, then there exists a closed 


p-form w such that 
i; W = Qj. (13.58) 


If w’ constitute a basis of the vector space H?(M) then the matrix of numbers 
07 = (z%,w?) = ; w (13.59) 


is called the period matrix, and the 0,’ themselves are the periods. 

Example: H,(T*) = R @ R is two dimensional. Since a finite-dimensional 
vector space and its dual have the same dimension, de Rham tells us that 
Hig(T”) is also two-dimensional. If we take as coordinates on T? the angles 
? and ¢, then the basis elements, or generators, of the cohomology spaces are 
the forms “dé” and “d@”. We have inserted the quotes to stress that these 
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expressions are not the d of a function. The angles @ and ¢ are not functions 
on the torus, since they are not single-valued. The homology basis 1-cycles 
can be taken as z» running from @ = 0 to 6 = 27 along @ = 7, and zy running 
from ¢ = 0 to ¢ = 2m along 0 = 7. Clearly, w = agd6/27 + agdo/27 returns 
J. = % and i: w = g¢ for any ag, a,, so {d0/27, db/27} and {z9, zg} are 
dual bases. 

Example: We have earlier computed the homology groups H2(RP?, R) = {0} 
and H,(RP?,R) = {0}. De Rham therefore tells us that H?(RP?,R) = {0} 
and H'(IRP?,R) = {0}. From this we deduce that all closed one- and two- 
forms on the projective plane RP? are exact. 

Example: As an illustration of de Rham part 1), observe that it is easy to 
show that a closed one-form ¢ can be written as df, provided that in o=0 
for all cycles. We simply define f = i @, and observe that the proviso 
ensures that f is not multivalued. 

Example: A more subtle problem is to show that, given a two-form w on S?, 
with f, g2¥ = 0 there is a globally-defined x such that w = dy. We begin 
by covering S? by two open sets D, and D_ which have the form of caps 
such that D, includes all of S? except for a neighbourhood of the south 
pole, while D_ includes all of S$? except a neighbourhood of the north pole, 
and the intersection, D,  D_, has the topology of an annulus, or cingulum, 
encircling the equator. 


Figure 13.10: A covering the 2-sphere by a pair of contractable caps. 


Since both D, and D_ are contractable, there are one-forms x, and y_ such 
that w = dy, in D, and w = dy_ in D_. Thus, 


d(y¥+—-x-)=0, nm D,ND_. (13.60) 
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Dividing the sphere into two disjoint sets with a common (but opposingly 
oriented) boundary T € D,; M D_, we have 


om f w= Pir— x) (13.61) 


and this is true for any such curve [’. Thus, by the previous example, 


=(x+-x-) =4f (13.62) 


for some smooth function f defined in D,4D_. We now introduce a partition 
of unity subordinate to the cover of S? by D, and D_. This partition is a 
pair of non-negative smooth functions, p+, such that p4 is non-zero only in 
D,, p— is non-zero only in D_, and p, + p_ = 1. Now 


f=psf—(-o)f, (13.63) 


and f_ = p,f is a function defined everywhere on D_. Similarly f, = 
(—p_)f is a function on D,. Notice the interchange of + labels! This is not 
a mistake. The function f is not defined outside DM D_, but we can define 
p_f everywhere on D, because f gets multiplied by zero wherever we have 
no specific value to assign to it. We now observe that 


xX, +df,=x_+df_, in D,ND-_. (13.64) 


Thus w = dy, where x is defined everywhere by the rule 


y= ee + df,, in D4, (13.65) 


x-+df_, in D_. 


It does not matter which definition we take in the cingular region D, 1 D_, 
because the two definitions coincide there. 

The methods of this example can be extended to give a proof of de Rham’s 
claims. 


13.5 Poincaré duality 


De Rham’s theorem does not require that our manifold M be orientable. Our 
next results do, however, require orientablity. We therefore assume through- 
out this section that M is a compact, orientable, D-dimensional manifold. 
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We will also require that M is a closed manifold —- meaning that it has no 


boundary. 


We begin with the observation that if the forms w; and w» are closed then 
so is wy AW . Furthermore, if one or both of w 1, we is exact then the product 
Ww \ we is also exact. It follows that the cohomology class [w1 Awe] of wi Aw 
depends only on the cohomology classes [w;] and [w.|. The wedge product 


thus induces a map 


HM, 


R) x HM, 


R) mas HP+4(M, 


R), 


(13.66) 


which is called the “cup product” of the cohomology classes. It is written 


as 


[wy /\ Wo] = [wi] U [wa], 


(13.67) 


and gives the cohomology the structure of a graded-commutative ring, de- 


noted by H*(M, R) 


More significant for us than the ring structure is that, given w € H?(M,R), 


we can obtain a real number by forming f,,w (This is the point at which 
we need orientability. We only know how to integrate over orientable chains, 
and so cannot even define f,,w when M is not orientable) and can com- 
bine this integral with the cup product to make any cohomology class [f] € 


H”~?(M,R) into an element F of (H?(M, 


for each [g| € H?(M,R). 


get any element F' of (H?(M, 


Fo) = f tng 


R))*. We do this by setting 


(13.68) 


Furthermore, it is possible to show that we can 


R))* in this way, and the corresponding [f] is 


unique. But de Rham has already given us a way of identifying the elements 


of (H?(M,R))* with the cycles in H,(M, 


map 


H,(M,R) @ H?-?(M,R). 


In particular the dimensions of these two spaces must coincide: 


bp(M) = bp-p(M). 


R)! There is, therefore, a 1-1 onto 


(13.69) 


(13.70) 


This equality of Betti numbers is called Poincaré duality. Poincaré originally 
conceived of it geometrically. His idea was to construct from each simplicial 
triangulation S of M anew “dual” triangulation S’, where, in two dimensions 
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for example, we place a new vertex at the centre of each triangle, and join the 
vertices by lines through each side of the old triangles to make new cells — 
each new cell containing one of the old vertices. If we are lucky, this process 
will have the effect of replacing each p-simplex by a (D — p)-simplex, and so 
set up a map between C,(S) and Cp_,(S’) that turns the homolgy “upside 
down.” The new cells are not always simplices, however, and it is hard to 
make this construction systematic. Poincaré’s original recipe was flawed. 
Our present approach to Poincaré’s result is asserting that for each basis 
p-cycle class [z?| there is a unique (up to cohomology) (D — p)-form w?~? 


such that 
fe-f ie PNP (13.71) 
ze M 


We can construct this woe ~? “physically” by taking a representative cycle z? 
Pp 


in the homology class [z;'] and thinking of it as a surface with a conserved 
unit (d — p)-form current flowing in its vicinity. An example would be the 
two-form topological current running along the one-dimensional worldline of 
a skyrmion. (See the discussion surrounding equation (12.64).) The w?? 
form a basis for H?-?(M,R). We can therefore expand f ~ fiw? hand. 
similarly for the closed p-form g, to obtain 


| aN f = fig'Tli, ), (13.72) 
M 
where the matrix 
1g) = 1h) = ae Rw (13.73) 
M 


is called the intersection form. From its definition we see that I(i, 7) satisfies 
the symmetry 


I(i,j) = (—1)?°-)1(j, i). (13.74) 


Less obvious is that /(i,7) is an integer that reports the number of times 
(counted with orientation) that the cycles z? and ae intersect. This latter 
fact can be understood from our construction of the w? as unit currents lo- 
calized near the z”~? cycles. The integrand in (13.73) is non-zero only in the 
neighbourhood of the intersections of z? with Z. ~?. and at each intersection 
constitutes a D-form that integrates up to give +1. 
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Figure 13.11: The intersection of two cycles: I(a,3)=1=1-—1+1. 


This claim is illustrated in the left-hand part of figure 13.11, which shows a 
region surrounding the intersection of the a and @ one-cycles on the 2-torus. 
The co-ordinate system has been chosen so that the a cycle runs along the 
x axis and the @ cycle along then y axis. Each cycle is surrounded by the 
narrow shaded regions —w < y < w and —w < x < w, respectively. To 
construct suitable forms w. and wg we select a smooth function f(x) that 
vanishes for |x| > w and such that { f dz = 1. In the local chart we can then 
set 


Wea f(y) dy, 
We = —f(x)dz, 


both these forms being closed. The intersection number is given by the 
integral 


(0,8) = fvarwe= ff fayfy)drdy=1. (03.75) 


The right-hand part of figure 13.11 illustrates why this intersection number 
depends only on the homology classes of the two one-cycles, and not on their 
particular instantiation as curves. 

We can more conveniently re-express (13.72) in terms of the periods of 
the forms 


pe f parine, 9% f 9=1G.ds (13.76) 


2 
J. 


as 


frrso-Dxea fr fee (13.77) 


tJ 
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where 

KG j=l -“U4l “Gok =1- 6) (13.78) 
is the transpose of the inverse of the intersection-form matrix. The decom- 
position (13.77) of the integral of the product of a pair of closed forms into 
a bilinear form in their periods is one of the two principal results of this 
section, the other being (13.70). 

In simple cases, we can obtain the decomposition (13.77) by more direct 
methods. Suppose, for example, that we label the cycles generating the 
homology group H,(T?) of the 2-torus as a and 6, and that a and b are 
closed (da = db = 0), but not necessarily exact, one-forms. We will show 


that 
[ane=fafo-fofa (13.79) 
T2 a B a B 


To do this, we cut the torus along the cycles a and (@ and open it out into 
a rectangle with sides of length L, and L,. The cycles a and @ will form 
the sides of the rectangle, and we will take them as lying parallel to the x 
and y axes, respectively. Functions on the torus now become functions on 
the rectangle. Not all functions on the rectangle descend from functions on 
the torus, however. Only those functions that satisfy the periodic bound- 
ary conditions f(0,y) = f(Lz,y) and f(#,0) = f(x, LZ,) can be considered 
(mathematicians would say “can be lifted”) to be functions on the torus. 


(04 
ie 
2) 
iat: fe iB 
(04 = 
(04 


Figure 13.12: Cut-open torus 


Since the rectangle (but not the torus) is retractable, we can write a = df 
where f is a function on the rectangle — but not necessarily a function on 
the torus, i.e. f will not, in general, be periodic. Since aA b = d(fb), we can 
now use Stokes’ theorem to evaluate 


[ano=f d(fo)= | fb. (13.80) 
T? T? OT? 
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The two integrals on the two vertical sides of the rectangle can be combined 
to a single integral over the points of the one-cycle (3: 


i fo= [Ufben)— FO.uP (13.81) 


We now observe that [f(Z,y) — f(0,y)| is a constant, and so can be taken 
out of the integral. It is a constant because all paths from the point (0, y) to 
(Lz, y) are homologous to the one-cycle a, so the difference f(Lz, y) — f(0, y) 
is equal to [ a. Thus, 


[Ueen — f(0,y)|b= [ofr (13.82) 


Similarly, the contributions of the two horizontal sides is 


[tFe.0 — f((x, Ly)]b = ~ fia fo (13.83) 


On putting the contributions of both pairs of sides together, the claimed 
result follows. 


13.6 Characteristic classes 


A supply of elements of H?”(M,R) and H?”(M, Z) is provided by the charac- 
teristic classes associated with connections on vector bundles over the man- 
ifold M. 
Recall that connections appear in covariant derivatives 

Vu = On + An, (13.84) 
and are to be thought of as matrix-valued one-forms A = A,dz". In the 
quantum mechanics of charged particles the covariant derivative that appears 
in the Schrodinger equation is 


Vi=az— —teAer, (13.85) 


Here, e is the charge of the particle on whose wavefunction the derivative 
acts, and Ave is the usual electromagnetic vector potential. The matrix- 
valued connection one-form is therefore 


A= —ieANerldat. (13.86) 
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In this case the matrix is one-by-one. 
In a non-abelian gauge theory with gauge group G the connection becomes 


A= id, Atdx" (13.87) 


The A, are hermitian matrices that have commutation relations cone = 
a Foods; where the f‘, are the structure constants of the Lie algebra of the 
group G. The x therefore form a representation of the Lie algebra, and 
this representation plays the role of the “charge” of the non-abelian gauge 
particle. 

For covariant derivatives acting on a tangent vector field fe, on a Rie- 
mann n-manifold, where the eg are an orthonormal vielbein frame, we have 


Awad", (13.88) 


where, for each yz, the coefficients wey, = —Wpa, can be thought of as the 
entries in a skew symmetric n-by-n matrix. These matrices are elements of 
the Lie algebra o(n) of the orthogonal group O(n). 

In all these cases we define the curvature two-form to be F = dA + A?, 
where a combined matrix and wedge product is to be understood in A?. In 
exercises 11.19 and 11.20 you used the Bianchi identity to show that the 
gauge-invariant 2n-forms tr (f"") were closed. The integrals of these forms 
over cycles provide numbers that are topological invariants of the bundle. 
For example, in four-dimensional QCD, the integral 


= -sa fe (F?) (13.89) 


872 


over a compactified four-dimensional manifold 2 is an integer that a mathe- 
matician would call the second Chern number of the non-abelian gauge bun- 
dle, and that a physicist would call the instanton number of the gauge field 
configuration. The closed forms themselves are called characteristic classes. 

In the following section we will show that the integrals of characteristic 
classes are indeed topological invariants. We also explain something of what 
these invariants are measuring, and illustrate why, when suitably normalized, 
they are integer-valued. 
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13.6.1 Topological invariance 


Suppose that we have been given a connection A and slightly deform it 
A—A+0A. Then F > F'+ 6F where 


6F = d(6A)+dAA+ AOA. (13.90) 
Using the Bianchi identity dF = FA — AF, we find that 


dtr(F") = ntrOF F*"*) 
= ntr(d(6A)F""') +ntr(6A AF") +ntr(AdAF”") 
= ntr(d(5A)F""') +ntr(6A AF” ') — ntr(6A F" 'A) 
= d{ntr(SAF”")}. (13.91) 


The last line of (13.91) is equal to the penultimate line because all but the 
first and last terms arising from the dF’s in d {tr(6A F'"~')} cancel in pairs. A 
globally-defined change in A therefore changes tr(f"") by the d of something, 
and so does not change its cohomology class, or its integral over a cycle. 

At first sight, this invariance under deformation suggests that all the 
tr(f”) are exact forms — they can apparently all be written as tr(F'") = 
dw2n—1(A) for some (2n — 1)-form wen—1(A). To find wa,_1(A) all we have to 
do is deform the connection to zero by setting A; =t A and 


F,=dA,+ A? =tdA+ tA’. (13.92) 
Then 6A; = Adt, and 
“tr Ft) =d{ntr(AF?")}. (13.93) 
Integrating up from t = 0, we find 
re") =a {n [ear it} ; (13.94) 
For example 
tr(F?) = dé2 fe tr(A(tdA + t?.A”) it} 


= (t (aaa + =A°) \ (13.95) 
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You should recognize here the w3(A) = tr(AdA + 2A*) Chern-Simons form 
of exercise 11.19. The naive conclusion — that all the tr(f"”) are exact 
— is false, however. What the computation actually shows is that when 
J tr(F") 4 0 we cannot find a globally defined one-form A representing the 
connection or gauge field. With no global A, we cannot globally deform A 
to zero. 

Consider, for example, an Abelian U(1) gauge field on the two-sphere S?. 
When the first Chern-number 


1 


= 13.96 
272 S2 ( ) 


Cy 
is non-zero, there can be no globally-defined one-form A such that F = 
dA. Glance back, however, at figure 13.10 on page 527. There we see that 
the retractability of the spherical caps D4 guarantees that there are one- 
forms A+ defined on Ds such that F’ = dAx+ in Dx. In the cingular region 
D,D_ where they are both defined, A, and A_ will be related by a gauge 
transformation. For a U(1) gauge field, the matrix g appearing in the general 
gauge transformation rule 


A— AI =q"'Ag+q-'dg, (13.97) 
of exercise 11.20 becomes the phase e’* € U(1). Consequently 
Ay =A_+e %de™ =A_+idy in D,ND_. (13.98) 


The U(1) group element e’* is required to be single valued in D, M D_, but 
the angle x may be multivalued. We now write c; as the sum of integrals over 
the north and south hemispheres of $?, and use Stokes theorem to reduce 
this sum to a single integral over the hemispheres’ common boundary, the 
equator [: 


1 1 
Cc =. 3S —— 
: 2n0 north 2m south 
1 1 
3/2 — dA_ 
2n0 north a x 2nt [.. 
1 1 
= a We teases A_ 
271 Jp 271 Jp 
1 
= oa od 13.99 
ree (13.99) 
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We see that c; is an integer that counts the winding of y as we circle T. A 
non-zero integer cannot be continuously reduced to zero, and if we attempt 
to deform A — tA — 0, we will violate the required single-valuedness of the 
U(1) group element e%*. 

Although the Chern-Simons forms w2,_;(A) cannot be defined globally, 
they are still very useful in physics. They occur as Wess-Zumino terms 
describing the low-energy properties of various quantum field theories, the 
prototype being the Skyrme-Witten model of Hadrons.® 


13.6.2 Chern characters and Chern classes 


Any gauge-invariant polynomial (with exterior multiplication of forms un- 
derstood) in F’ provides a closed, topologically invariant, differential form. 
Certain combinations, however, have additional desirable properties, and so 
have been given names. 


The form 
1 q Me 
h, (Ff) = —|—F 13.1 
eh = tt in (5. ) \ (13.100) 


is called the n-th Chern character. It is convenient to think of this 2n-form 
as being the n-th term in a generating-function expansion 


cry ey {exp (~F)} cha Pye yh) Paar) fee 0S. i00) 


where cho(F’) “ trJ is the dimension of the space on which the \, act. This 
formal sum of forms of different degree is called the total Chern character. 
The n! normalization is chosen because it makes the Chern character behave 
nicely when we combine vector bundles — as we now do. 

Given two vector bundles over the same manifold, having fibres U, and V, 
over the point x, we can make a new bundle with the direct sum U, @ V, as 
fibre over x. This resulting bundle is called the Whitney sum of the bundles. 
Similarly we can make a tensor-product bundle whose fibre over x is U, ® Vz. 

Let us use the notation ch(U) to represent the Chern character of the 
bundle with fibres U,, and U @V to denote the Whitney sum. Then we have 


ch(U @V) = ch(U) + ch(V), (13.102) 


3E. Witten, Nucl. Phys. B223 (1983) 422; ibid. B223 (1983) 433. 
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and 
ch(U ®V) = ch(U) A ch(V). (13.103) 
The second of these formule ae about because if AC is a Lie can 


element acting on V“ and A?) the corresponding element acting on V? 
then they act on the tensor product V™ @V®) as 


X28) — )O @ r+ 1@ dr), (13.104) 


where J is the identity operator on the appropriate space in the tensor prod- 
uct, and for matrices A and B we have 


tr {exp (A ® 1+ 1 ® B)} = tr {exp A © exp B} = tr {exp A} tr {exp B}. 
(13.105) 
In terms of the individual ch,,(V) equations (13.102) and (13.103) read 


ch,(U @ V) = ch,(U) + ch, (V), (13.106) 


and 
n 


chn(U @ V) = © chp-m(U) A chm (V). (13.107) 
m=0 
Related to the Chern characters are the Chern classes. These are wedge- 
product polynomials in the Chern characters, and are defined, via the matrix 
expansion 


1 
det (I +A) =1+trA+5((tr4)?— tr?) sh (13.108) 


by the generating function for the total Chern class: 


c( EF’) = det (14 HF) =140(F) balk) + (13.109) 
Thus 
c(F’) = ch, (F), Co(F’) = schi(F) Achy (F) ~ eha(F), (13.110) 
and so on. 


For matrices A and B we have det(A @ B) = det(A) det(B), and this 
leads to 
CU @V) =c(U) Ac(V). (13.111) 
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Although the Chern classes are more complicated in appearance than the 
Chern characters, they are introduced because their integrals over cycles turn 
out to be integers, and this property remains true of integer-coefficient sums 
of products of Chern-classes. The cohomology classes [c,(/)] are therefore 
elements of the integer cohomology ring H*(M,Z). This property does not 
hold for the Chern characters, whose integrals over cycles can be fractions. 
The cohomology classes [ch,(£’)| are therefore only elements of H*(M, Q). 

When we integrate products of Chern classes of total degree 2m over 
closed 2m-dimensional orientable manifolds we get integer Chern numbers. 
These integers can be related to generalized winding numbers, and character- 
ize the extent to which the gauge transformations that relate the connection 
fields in different patches serve to twist the vector bundle. Unfortunately 
it requires a considerable amount of combinatorial machinery (the Schubert 
calculus of complex Grassmannians) to explain these integers. 


Pontryagin and Euler classes 


When the fibres of a vector bundle are vector spaces over R, the complex 
skew-hermitian matrices 7A, are replaced by real skew symmetric matrices. 
The Lie algebra of the n-by-n matrices iA, was a subalgebra of u(n). The Lie 
algebra of the n-by-n real, skew symmetric, matrices is a subalgebra of o(7n). 
Now, the trace of an odd power of any skew symmetric matrix is zero. As a 
consequence, Chern characters and Chern classes containing an odd number 
of F’s all vanish. The remaining real 4n-forms are known as Pontryagin 
classes. The precise definition is 


pe(V) = (—1)*cox(V). (13112) 


Pontryagin classes help to classify bundles whose gauge transformations 
are elements of O(n). If we restrict ourselves to gauge transformations that lie 
in SO(n), as we would when considering the tangent bundle of an orientable 
Riemann manifold, then we can make a gauge-invariant polynomial out of 
the skew-symmetric matrix-valued F' by forming its Pfaffian. 

Recall (or see exercise A.18) that the Pfaffian of a skew symmetric 2n- 
by-2n matrix A with entries aj; is 


1 


ae a Qnrn\ €i1,...19n Vizig aa Qign—1i2n: (13.113) 
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The Euler class of the tangent bundle of a 2n-dimensional orientable manifold 
is defined via its skew-symmetric Riemann-curvature form 


1 
R= 5 Rabu dada” (13.114) 
to be ; 
R)=Pf{—R}]. 13.115 
(R) = Pt (5-R) (13.115) 
In four dimensions, for example, this becomes the 4-form 
Co) en ee (13.116) 
€ — B72 aved ab+ttcd- : 
The generalized Gauss-Bonnet theorem asserts — for an oriented, even- 


dimensional, manifold without boundary — that the Euler character is given 
by 


x(M) = [ ee. (13.117) 


We will not prove this theorem, but in section 16.3.6 we will illustrate the 
strategy that leads to Chern’s influential proof. 


Exercise 13.5: Show that 


c3(F) = 5 ((c F)) — 6ch;(F)cho(F) + 12 chs(F)). 


13.7 Hodge theory and the Morse index 


The Laplacian, when acting on a scalar function ¢ in R? is simply div (grad 4), 
but when acting on a vector v it becomes 


V°v = erad (div v) — curl (curl v). (13.118) 


Why this weird expression? How should the Laplacian act on other types of 
fields? 

For general curvilinear co-ordinates in R”, a reasonable definition for the 
Laplacian of a vector or tensor field T is V?T = g’’V .V UT where V,, is the 
flat-space covariant derivative. This is the unique co-ordinate independent 
object that reduces in Cartesian co-ordinates to the ordinary Laplacian acting 
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on the individual components of T. The proof that the rather different- 
seeming (13.118) holds for vectors is that it too is constructed out of co- 
ordinate independent operations, and in Cartesian co-ordinates reduces to 
the ordinary Laplacian acting on the individual components of v. It must 
therefore coincide with the covariant derivative definition. Why it should 
work out this way is not exactly obvious. Now, div, grad and curl can all be 
expressed in differential-form language, and therefore so can the scalar and 
vector Laplacian. Moreover, when we let the Laplacian act on any p-form 
the general pattern becomes clear. The differential-form definition of the 
Laplacian, and the exploration of its consequences, was the work of William 
Hodge in the 1930’s. His theory has natural applications to the topology of 
manifolds. 


13.7.1 The Laplacian on p-forms 


Suppose that M is an oriented, compact, D-dimensional manifold without 
boundary. We can make the space 2?(M) of p-form fields on M into an L? 
Hilbert space by introducing the positive-definite inner product 


(a,b), = (b,a), = | 


axb= = f ae G Gizig..iy DP”. (13.119) 
M pl 


Here, the subscript p denotes the order of the forms in the product, and 
should not to be confused with the p we have elsewhere used to label the 
norm in L? Banach spaces. The presence of the ,/g and the Hodge x operator 
tells us that this inner product depends on both the metric on M and the 
global orientation. 

We can use this new inner product to define a “hermitian adjoint” 6 = dt 
of the exterior differential operator d. The inverted commas “...” are because 
this hermitian adjoint is not quite an adjoint operator in the normal sense 
— d takes us from one vector space to another — but it is constructed in an 
analogous manner. We define 6 by requiring that 

(da, b) 41 = (a, 40), (13.120) 
where a is an arbitrary p-form and 6 an arbitrary (p+ 1)-form. Now recall 
that * takes p-forms to (D — p) forms, and so dxb is a (D —p) form. Acting 
twice on a (D — p)-form with « gives us back the original form multiplied by 
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(—1)?(?-). We use this to compute 
d(axb) = daxb+(-—1)?a(d xb) 
da x b + (—1)?(—1)?P-)a x (xd x b) 
= daxb— (—1)??tlax (xdxb). (13.121) 


In obtaining the last line we have observed that p(p — 1) is an even integer 
and so (—1)?-”) = 1. Now, using Stokes’ theorem, and the absence of a 
boundary to discard the integrated-out part, we conclude that 


[ taay+b= (aren f ax (xd xb), (13.122) 


= (da, b) 41 = (-1)??**(a, («d)b), (13.123) 


p+l 


and so 6b = (—1)??*1(xdx)b. This was for 6 acting on a (p—1) form. Acting 
on a p form instead we have 


6 = (—1)PPHPH A dx. (13.124) 
Observe how the sequence of maps in *d* works: 


ap(M) & 9?->cmy & 9?) *% 0PM). (13.125) 
The net effect is that 6 takes a p-form to a (p — 1)-form. Observe also that 
02 x xd? x=0. 
We now define a second-order partial differential operator A, to be the 
combination 


A, = 6d + d6, (13.126) 


acting on p-forms This maps a p-form to a p-form. A slightly tedious calcu- 
lation in cartesian co-ordinates will show that, for flat space, 


A,=-V’? (13.127) 


on each component of a p-form. This A,, is therefore the natural definition 
for (minus) the Laplacian acting on differential forms. It is usually called the 
Laplace-Beltrami operator. 

Using (a, db) = (da, b) we have 


((Sd + d6)a, b),, = (5a, 5b), + (da, db),,, = (a, (6d + d5)b),, (13.128) 
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and so we deduce that A,, is self-adjoint on 0?(M). The middle terms in 
(13.128) are both positive, so we also see that A, is a positive operator — 
i.e. all its eigenvalues are positive or zero. 

Suppose that A,a = 0. Then (13.128) for a = b becomes 


0 = (da, da),_, + (da, da) (13.129) 


pt: 
Because both of these inner products are positive or zero, the vanishing of 
their sum requires them to be individually zero. Thus A,a = 0 implies that 
da = da = 0. By analogy with harmonic functions, we call a form that is 
annihilated by A, a harmonic form. Recall that a form a is closed if da = 0. 
We correspondingly say that a is co-closed if da=0. A differential form is 
therefore harmonic if and only if it is both closed and co-closed. 

When a self-adjoint operator A is Fredholm (7.e the solutions of the equa- 
tion Ax = y are governed by the Fredholm alternative) the vector space on 
which A acts is decomposed into a direct sum of the kernel and range of the 
operator 

V = Ker (A) 6 Im (A). (13.130) 


It may be shown that our Laplace-Beltrami A, is a Fredholm operator, and 
so for any p-form w there is an 7 such that w can be written as 


WwW (dé + dd)n + 


da+68+7, (13.131) 


where a = 6n, 3 = dn, and y is harmonic. This result is known as the 
Hodge decomposition of w. It is a form-language generalization of the of the 
Hodge-Wey] and Helmholtz-Hodge decompositions of chapter 6. It is easy to 
see that a, @ and y are uniquely determined by w. If they were not, then we 
could find some a, 3 and ¥y such that 


0=da+68+7 (13.132) 


with non-zero da, 63 and y. To see that this is not possible, take the d of 
(13.132) and then the inner product of the result with 6. Because d(da) = 
dy = 0, we end up with 


0 = (G,ddp) 
(5, 58). (13.133) 
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Thus 6G = 0. Now apply 6 to the two remaining terms of (13.132) and take 
an inner product with a. Because dy = 0, we find (da,da) = 0, and so 
da = 0. What now remains of (13.132) asserts that y = 0. 
Suppose that w is closed. Then our strategy of taking the d of the de- 
composition 
w=dat+d0+y7, (13.134) 


followed by an inner product with (@ leads to 63 = 0. A closed form can thus 
be decomposed as 
w=da+y7, (13.135) 


with a and y unique. Each cohomology class in H?(M) therefore contains 
a unique harmonic representative. Since any harmonic function is closed, 
and hence a representative of some cohomology class, we conclude that there 
is a 1-1 correspondence between p-form solutions of Laplace’s equation and 
elements of H?(/). In particular 


dim(Ker A,) = dim (H?(M)) = b,. (13.136) 


Here 6, is the p-th Betti number. From this we immediately deduce from the 
definition of the Euler character (13.35) that 


x(M) = 5° (-1)?dim(Ker A,), (13.137) 


p=0 


where y(/) is the Euler character of the manifold M. There is therefore 
an intimate relationship between the null-spaces of the second-order partial 
differential operators A,, and the global topology of the manifold in which 
they live. This is an example of an index theorem. 

Just as for the ordinary Laplace operator, A, has a complete set of eigen- 
functions with associated eigenvalues \. Because the the manifold is compact 
and hence has finite volume, the spectrum will be discrete. Remarkably, the 
topological influence we uncovered above is restricted to the zero-eigenvalue 
spaces of p-forms . To see this, suppose that we have a p-form eigenfunction 
uy for A,: 

Apu = AU). (13.138) 


Then 


A du = d A,uy 
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= d(dé + 6d)uy 

= (dd5)duy 

= (5d+déd)duy 

= Agiduy. (13.139) 


Thus, provided it is not identically zero, du, is a (p + 1)-form eigenfunction 
of A(p+1) with eigenvalue A. Similarly, du, is a (p — 1)-form eigenfunction 
also with eigenvalue 4. 

Can du, be zero? Yes! It will certainly be zero if uy itself is the d of 
something. What is less obvious is that it will be zero only if it is the d of 
something. To see this suppose that du, = 0 and \ 4 0. Then 


Thus du, = 0 implies that u, = dn, where 7 = 6u)/A. We see that for A 
non-zero, the operators d and 6 map the A eigenspaces of A into one another, 
and the kernel of d acting on p-form eigenfunctions is precisely the image of 
d acting on (p — 1)-form eigenfunctions. In other words, when restricted to 
positive \ eigenspaces of A, the cohomology is trivial. 

The set of spaces ye together with the maps d : “ > cn therefore 
constitute an exact sequence when A # 0, and so the alternating sum of their 


dimension must be zero. We have therefore established that 


S_(-1)"dim V, = a : 4 (13.141) 


Pp 


All the topological information resides in the null-spaces, therefore. 


Exercise 13.6: Show that if w is closed and co-closed then so is xw. Deduce 
that in a for a compact orientable D-manifold we have b, = bp_p . This 
observation therefore gives another way of understanding Poincaré duality. 


13.7.2 Morse theory 


Suppose, as in the previous section, that M is a D-dimensional compact 
manifold without boundary and V : M — Ra smooth function. The global 
topology of M imposes some constraints on the possible maxima, minima 
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and saddle points of V. Suppose that P is a stationary point of V. Taking 
co-ordinates such that P is at x” = 0, we can expand 


1 
V(x) = V(0) + tata +... (13.142) 


Here, the matrix H,,, is the Hessian 


dep O°V 


Ay, = ——|. 13.143 
H Ont On |, ( ) 


We can change co-ordinates so as reduce the Hessian to a canonical form 
which is diagonal and has only +1,0 on its diagonal: 


—Im 
Hy = ie (13.144) 
Opi 


If there are no zeros on the diagonal then the stationary point is said to be 
non-degenerate. The the number m of downward-bending directions is then 
called the index of V at P. If P were a local maximum, then m = D, n = 0. 
If it were a local minimum then m = 0, n = D. When all its stationary 
points are non degenerate, V is said to be a Morse function. This is the 
generic case. Degenerate stationary points can be regarded as arising from 
the merging of two or more non-degenerate points. 

The Morse index theorem asserts that if V is a Morse function, and if 
we define No to be the number of stationary points with index 0 (i.e. local 
minima), and N,; to be the number of stationary points with index 1 etc., 


then 
D 


SS (-1)" Nin = x(M). (13.145) 
m=0 
Here x(M) is the Euler character of M. Thus, a function on the two- 
dimensional torus (which has y = 0) can have a local maximum, a local 
minimum and two saddle points, but cannot have only one local maximum, 
one local minimum and no saddle points. On a two-sphere (x = 2), if V has 
one local maximum and one local minimum it can have no saddle points. 
Closely related to the Morse index theorem is the Poincaré-Hopf theorem. 
This counts the isolated zeros of a tangent-vector field X on a compact D- 
manifold and, amongst other things, explains why we cannot comb a hairy 
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ball. An isolated zero is a point z, at which X becomes zero, and that has 
a neighbourhood in which there is no other zero. If X possesses only finitely 
many zeros then each of them will be isolated. For an isolated zero, we can 
define a vector field index at z, by surrounding it with a small (D—1)-sphere 
on which X does not vanish. The direction of X at each point on this sphere 
then provides a map from the sphere to itself. The index i(z,,) is defined to 
be the winding number (Brouwer degree) of this map. This index can be any 
integer, but in the special case that X is the gradient of a Morse function it 
takes the value i(z,) = (—1)™ where m is the Morse index at Zp. 


NVA 
ON AYE 


a) 


9) 


Figure 13.13: Two-dimensional vector-fields and their streamlines near zeros 
with indices a) i(z,) = +1, b) t(z) = —1, c) i(z,) = +1. 


The Poincaré-Hopf theorem states that, for a compact manifold without 
boundary, and for a tangent vector field with only finitely many zeros, 


S> i(zn) = x(M). (13.146) 


zeros n 


A tangent-vector field must therefore always have at least one zero unless 
x(M) = 0. For example, since the two-sphere has x = 2, it cannot be 
combed. 
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A 


a pod ~« 


Figure 13.14: Gradient vector field and streamilines in a two-simplex. 


If one is prepared to believe that >°,...,%(2n) is the same integer for all 
tangent vector fields X on M, it is simple to show that this integer must 
be equal to the Euler character of MM. Consider, for ease of visualization, 
a two-manifold. Triangulate M and take X to be the gradient field of a 
function with local minima at each vertices, saddle points on the edges, and 
local maxima at the centre of each face (see figure 13.14). It must be clear 
that this particular field X has 


S> ie) =V-E+F = x(M). (13.147) 


Zeros n 


In the case of a two-dimensional oriented surface equipped with a smooth 
metric, it is also simple to demonstrate the invariance of the index sum. 
Consider two vector fields X and Y. Triangulate M so that all zeros of both 
fields lie in the interior of the faces of the simplices. The metric allows us 
to compute the angle 6 between X and Y wherever they are both non-zero, 
and in particular on the edges of the simplices. For each two-simplex a we 
compute the total change A@ in the angle as we circumnavigate its boundary. 
This change is an integral multiple of 27, with the integer counting the 


difference 
S > ilm)-— SS i(en) (13.148) 


zeros of X€a zeros of Y€a 


of the indices of the zeros within ¢. On summing over all triangles a, each 
edge is traversed twice, once in each direction, so }), A@ vanishes . The total 
index of X is therefore the same as that of Y. 

This pairwise cancellation argument can be extended to non-orientable 
surfaces, such as the projective plane, In this case the edges constituting the 
homological “boundary” of the closed surface are traversed twice in the same 
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direction, but the angle @ at a point on one edge is paired with —@ at the 
corresponding point of the other edge. 


Supersymmetric quantum mechanics 


Edward Witten gave a beautiful proof of the Morse index theorem for a closed 
orientable manifold M by re-interpreting the Laplace-Beltrami operator as 
the Hamiltonian of supersymmetric quantum mechanics on M. Witten’s 
idea had a profound impact, and led to quantum physics serving as a rich 
source of inspiration and insight for mathematicians. We have seen most 
of the ingredients of this re-interpretation in previous chapters. Indeed you 
should have experienced a sense of déj7a vu when you saw d and 6 mapping 
eigenfunctions of one differential operator into eigenfunctions of a related 
operator. 

We begin with a novel way to think of the calculus of differential forms. 
We introduce a set of fermion annihilation and creation operators w" and 
wi" which anti-commute, Ww” = —wW”W", and obey the anticommutation 
relation 

{yi wy} = oi try = go. (13.149) 
Here, g’” is the metric tensor, and the Greek indices 4 and v range from 1 
to D. As is usual when we are given annihilation and creation operators, 
we also introduce a vacuum state |0) which is killed by all the annihilation 
operators: w"|0) = 0. The states 


(apt ye (aptyr2 (bt) I0), (13.150) 


with each of the p; taking the value one or zero, then constitute a basis for 
2”-dimensional Hilbert space. We call p = 5+, p; the fermion number of the 
state. We assume that (0|0) = 1 and use the anti-commutation relations to 
show that 


(Ofer... bteqpln bt pth? .. pt |0) 


is zero unless p = q, in which case it is equal to 
git gle? ... gHp’e + (permutations). 


We now make the correspondence 


1 7 : 1 
idle (a)yt* yi? a pi" |0) aifinent (x) da! da... dat, 


(13.151) 
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to identify p-fermion states with p-forms. We think of fj, ),....,(@) as being 
the wavefunction of a particle moving on M, with the subscripts informing 
us there are fermions occupying the states j;. It is then natural to take the 
inner product of 


1 aE 2 p 
Ja) = pi idaetip 2) qi 3 2007 |0) (13.152) 
and ; 
1b) = Pata ta (apt pt ptt) (13.153) 
to be 
1 V1 Vq 
(a, b) = / Pela ra Bins pp.--ttp 2) Ory v9...¥g(£) (Op +e pirat eee yi |0) 
M . . 


1 
= Bog f PG Oi yy.yp (OO). (13.154) 
M . 


This coincides the Hodge inner product of the corresponding forms. 

If we lower the index on y by defining w, to be g,,i" then the action 
of the annihilation operator X“~,, on a p-fermion state coincides with the 
action of the interior multiplication ix on the corresponding p-form. All the 
other operations of the exterior calculus can also be expressed in terms of the 
w and w"’s. In particular, in Cartesian co-ordinates where g,, = d,,, we can 
identify d with i" O,. To find the operator that corresponds to the Hodge 
6, we compute 


b= di = (yi"9,)' = Oty" = —d," = —y"d,. (13.155) 


The hermitian adjoint of 0, is here being taken with respect to the standard 
L?(R®) inner product. This computation becomes more complicated when 
when g,, becomes position dependent. The adjoint at then involves the 
derivative of \/g, and w and 0, no longer commute. For this reason, and 
because such complications are inessential for what follows, we will delay 
discussing this general case until the end of this section. 

Having found a simple formula for 6, it is now automatic to compute 


di + 6d = —{b"", b"} 0,0, = —6""9,,0, = -V?. (13.156) 


This much easier than deriving the same result by using 6 = (—1)??+?t! «dx. 
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Witten’s fermionic formalism simplifies a number of computations involv- 
ing 6, but his real innovation was to consider a deformation of the exterior 
calculus by introducing the operators 


dye Ogee: jb Se Oe (13.157) 


and the t-deformed 
Ay = dy04 + Od. (13.158) 


Here, V(x) is the Morse function whose stationary points we seek to count. 

It is easy to see that the deformed derivative continues to obey d? = 0. 
We also see that dw = 0 if and only if de" “ w = 0. Similarly, if w = dn then 
e'Yw = den. The cohomology of d is therefore transformed into the 
cohomologly of d, by multiplication by e~'”. Since the exponential function 
is never zero, this correspondence is invertible and the mapping is an isomor- 
phism. In particular the dimensions of the spaces Ker (d;),/Im (dz)p-1 are t 
independent and coincide with the ¢ = 0 Betti numbers b,. Furthermore, the 
t-deformed Laplace-Beltrami operator remains Fredholm with only positive 
or zero eigenvalues. We can therefore make a Hodge decomposition 


w= hatoe’gt+y, (13.159) 
where A;y = 0, and concude that 
dim (Ker (A;),,) = 6, (13.160) 


as before. The non-zero eigenvalue spaces will also continue to form exact 
sequences. Nothing seems to have changed! Why do we introduce d; then? 
The motivation is that when t becomes large we can use our knowledge of 
quantum mechanics to compute the Morse index. 

To do this, we expand out 


d, = vi"(d,+10,V) 
& = —vr(d, —td,V) (13.161) 


and find 
dydy + bedy = —V? +O VV? + tl, WY] OV. (13.162) 


This can be thought of as a Schrodinger Hamiltonian on M containing a 
potential ¢?|VV |? and a fermionic term t[y™", ~”] 2,V. When t is large and 
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positive the potential will be large and positive everywhere except near those 
points where VV = 0. The wavefunctions of all low-energy states, and in 
particular all zero-energy states, will therefore be concentrated at precisely 
the stationary points we are investigating. Let us focus on a particular sta- 
tionary point, which we will take as the origin of our co-ordinate system, and 
see if any zero-energy state is localized there. We first rotate the coordinate 
system about the origin so that the Hessian matrix 07,,V |) becomes diagonal 
with eigenvalues \,,. The Schrodinger problem can then be approximated by 
a sum of harmonic oscillator hamiltonians 


D 
0° 22,2 bo 
~ ee Salt” alyt 
Ave © | aap TENA + tl vp (13.163) 
The commutator [et y'| takes the value +1 if the i’th fermion state is oc- 
cupied, and —1 if it is not. The spectrum of the approximate Hamiltonian 
is therefore 


D 
tS = {|Asl(1 + 2ni) + Ai}. (13.164) 
i=1 
Here the n; label the harmonic oscillator states. The lowest-energy states 
will have all the n; = 0. To get a state with zero energy we must arrange 
for the + sign to be negative (no fermion in state 7) whenever )j; is positive, 
and to be positive (fermion state 7 occupied) whenever \; is negative. The 
fermion number “p” of the zero-energy state is therefore equal to the number 
of negative \; — i.e. to the index of the critical point! We can, in this 
manner, find one zero-energy state for each critical point. All other states 
have energies proportional t, and therefore large. Since the number of zero 
energy states having fermion number p is the Betti number b,, the harmonic 
oscillator approximation suggests that b, = Np. 
If we could trust our computation of the energy spectrum, we would have 
established the Morse theorem 


S-(-1)?Np = Yo(-1)Pbp = x(M), (13.165) 


by having the two sums agree term by term. Our computation is only ap- 
proximate, however. While there can be no more zero-energy states than 
those we have found, some states that appear to be zero modes may instead 
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have small positive energy. This might arise from tunnelling between the 
different potential minima, or from the higher-order corrections to the har- 
monic oscillator potentials, both effects we have neglected. We can therefore 
only be confident that 


N, > bp. (13.166) 


The remarkable thing is that, for the Morse index, this does not matter! If 
one of our putative zero modes gains a small positive energy, it is now in 
the non-zero eigenvalue sector of the spectrum. The exact-sequence property 
therefore tells us that one of the other putative zero modes must also be a 
not-quite-zero mode state with exactly the same energy. This second state 
will have a fermion number that differs from the first by plus or minus one. 
An error in counting the zero energy states therefore cancels out when we 
take the alternating sum. Our unreliable estimate b, ~ N, has thus provided 
us with an exact computation of the Morse index. 

We have described Witten’s argument as if the manifold M were flat. 
When the manifold M is not flat, however, the curvature will not affect 
our computations. Once the parameter t is large, the low-energy eigenfunc- 
tions will be so tightly localized about the critical points that they will be 
hard-pressed to detect the curvature. Even if the curvature can effect an 
infintesimal energy shift, the exact-sequence argument again shows that this 
does not affect the alternating sum. 


The Weitzenbock formula 


Although we we were able to evade them when proving the Morse index 
theorem, it is interesting to uncover the workings of the nitty-gritty Rie- 
mann tensor index machinary that lie concealed behind the polished facade 
of Hodge’s d, 6 calculus. 

Let us assume that our manifold M is equipped with a torsion-free con- 
nection [“,, = I“ ,, and use this connection to define the action of an 
operator Vii by specifying its commutators with c-number functions f, and 
with the W and wi"’s: 


[Vins f] = Outs 
[Vis pry = TY yt, 
[Vue] = —T’pay. (13.167) 
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We also set V,,|0) = 0. These rules allow us to compute the action of Vi on 
Fiviise gs (x)yi™ ... pi"?|0). For example 


Vi, (fas""0)) = (IM. oe + $09.) [0) 
= (Mn flu” + fal¥,.¥"*]) [0) 
(ufo — FoF uw) 0*”[0) 
(Visv) v |0), (13.168) 
where 
Wily = Ont —- pets (13.169) 


is the usual covariant derivative acting on the componenents of a covariant 
vector. : 

The metric g’” counts as a cnumber function, and so [Va,g""] is not 
zero, but is instead 0,g"”. This might be disturbing — being able pass the 
metric through a covariant derivative is a basic compatibilty condition in 
Riemann geometry — but all is not lost. V,, (with a caret) is not quite the 
same beast as V,,. We proceed as follows: 

0.9” = [Vos gi] 
= [Va er, v'}] 
= [Va vy} + [Va vo] 
= {0 Won — (0 on 
Se ge I ey gee (13.170) 
Thus, we conclude that 
dagh” + gan + 9 T Har = Vag” = 0. (13.171) 


Metric compatibility is therefore satisfied, and the connection is therefore the 
standard Riemannian 


1 
=o (Ougav ze Ov Gur = On9uv) : (13.172) 


Knowing this, we can compute the adjoint of Via 
Ah 1 4 
(V,.) _ “ag Pe 
= Vi — dO, In J/g 
(7 Ae: (13.173) 


a 
I wy = 


13.7. HODGE THEORY AND THE MORSE INDEX 599 


That [’,, is the logarithmic derivative of ,/g is a standard identity for the 
Riemann connection (see exercise 11.14). The resultant formula for (V,,)t 
can be used to verify that the second and third equations in (13.167) are 
compatible with each other. 

We can also compute [[V,, Vv], J, and from it deduce that 


[Vis Vol = Borw ht, (13.174) 
where 
Re guy = Onl pv — OT au + Tul gu — Pel au (13:175) 


is the Riemann curvature tensor. 
We now define d to be . 
d=" V,,. (13.176) 


Its action coincides with the usual d because the symmetry of the IM,’s 
ensures that their contributions cancel. From this we find that 6 is 


eet 
5 = (wv) 

= vi, yet 

= —(Vu + Mw )y" 

_ SURLY a Mw) a Py” 

= —Y'V,. (13.177) 


The Laplace-Beltrami operator can now be worked out as 


di+éd = — (ut'V wv, ih vrval"y,) 
= = ({0 de }(V,V, — Pw Vo) + 0's" Y,, Val) 
= = CCANe —~T Vo) + wrt ly Rory) (13.178) 
(13.179) 
By making use of the symmetries Royyp = Royo, and Roxy, = —Rorw we 


can tidy up the curvature term to get 


dé + dd = —g""(V,,Vp —T?w Vo) — Vr wey wh” Ropu: (13.180) 
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This result is called the Weitzenbéck formula. An equivalent formula can 
be derived directly from (13.124), but only with a great deal more effort. 
The part without the curvature tensor is called the Bochner Laplacian. It is 
normally written as B = —g’"”V,,V_ with V, being understood to be acting 
on the index vy, and therefore tacitly containing the extra I’, that must be 
made explicit — as we have in (13.180) — when we define the action of V,, 
via commutators. The Bochner Laplacian can also be written as 


Ba ge V5 (13.181) 


which shows that it is a positive operator. 


13.8 Further exercises and problems 


Exercise 13.7: Let 
A= A, dz + A, dy + A, dz, 


be a closed form in R?. Use the formula (13.6) of section 13.2.1 to find a 
scalar p(x, y, z) such that A = dy. Compute the exterior derivative from your 
expression for y and verify that it reconstitutes A. 


Exercise 13.8: By considering the example of the unit disc in two dimensions, 
show that the condition of being closed — in the sense of having no bound- 
ary — is a necessary condition in the statement of Poincaré duality. What 
goes wrong with our construction of the elements of H?-?(M) from cycles in 
H,(M) in this case? 


Exercise 13.9: Use Poincaré duality to show that that the Euler Character of 
any odd-dimensional closed manifold is zero. 


Chapter 14 


Groups and Group 
Representations 


Groups usually appear in physics as symmetries of the system or model we 
are studying. Often the symmetry operation involves a linear transformation, 
and this naturally leads to the idea of finding sets of matrices having the same 
multiplication table as the group. These sets are called representations of 
the group. Given a group, we endeavour to find and classify all possible 
representations. 


14.1 Basic ideas 


We begin with a rapid review of basic group theory. 


14.1.1 Group axioms 


A group G is a set with a binary operation that assigns to each ordered pair 
(91, 92) of elements a third element, g3, usually written with multiplicative 
notation as g3 = 91g2. The binary operation, or product, obeys the following 
rules: 
i) Associativity: 91(9293) = (9192) 93. 
ii) Existence of an identity: There is an element’ e € G such that eg = g 
for all g € G. 


‘The symbol “e” is often used for the identity element, from the German Einheit, 
meaning “unity.” 
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! such 


iii) Existence of an inverse: For each g € G there is an element g7 
that g-'g =e. 
From these axioms there follow some conclusions that are so basic that 
they are often included in the axioms themselves, but since they are not 
independent, we state them as corollaries. 


Corollary i): gg~' = e. 


Proof: Start from g-'g = e, and multiply on the right by g~! to get 
g ‘gg! = eg’ = g™', where we have used the left identity property of 
e at the last step. Now multiply on the left by (g~')~', and use associativity 
to get gg | =e. 

Corollary ii): ge = g. 

Proof: Write ge = 9(g-'g9) = (gg ')g =e9 = 9. 

Corollary iii): The identity e is unique. 

Proof: Suppose there is another element e; such that e,g = eg = g. Multiply 
on the right by g~! to get eye = e? = e, but eye = €1, so €; = €. 

Corollary iv): The inverse of a given element g is unique. 

Proof: Let gig = gog = e. Use the result of corollary (i), that any left inverse 
is also a right inverse, to multiply on the right by g:, and so find that g; = go. 


Two elements g; and gz are said to commute if gi1g2 = gagi. If the group 
has the property that gig2 = g2g1 for all gi, g2 € G, it is said to be Abelian, 
otherwise it is non-Abelian. 

If the set G contains only finitely many elements, the group G is said to 
be finite. The number of elements in the group, |G], is called the order of 
the group. 


Examples of groups: 


1) The integers Z under addition. The binary operation is (n,m)  n+m, 
and “0” plays the role of the identity element. This is not a finite group. 

2) The integers modulo n under addition. (m,m’) + m+m’, modn. This 
group is denoted by Z,,, and is finite. 

3) The non-zero integers modulo p (a prime) under multiplication (m,m') 
mm’, modp. Here “1” is the identity element. If the modulus is not 
a prime number, we do not get a group (why not?). This group is 
sometimes denoted by (Z,)*. 

4) The set of numbers {2, 4,6, 8} under multication modulo 10. Here, the 


number “6” plays the role of the identity! 


14.1. 


5) 


6) 


7) 
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The set of functions 
A@=% A@) = > Ae) =2, 
2 2 
fA@=2, fl) = 1-4 f= 


with (fi, fj) - fio f;. Here, the “o” is a standard notation for compo- 
sition of functions: (f; ° f;)(z) = fi(f;(z)). 

The set of rotations in three dimensions, equivalently the set of 3-by-3 
real matrices O, obeying OO = I and det O = 1. This is the group 
SO(3). SO(n) is defined analogously as the group of rotations in n 
dimensions. If we relax the condition on the determinant we get the 
orthogonal group O(n). Both SO(n) and O(n) are examples of Lie 
groups. A Lie group a group that is also a manifold M, and whose 
multiplication law is a smooth function M x M — M. 

Groups are often specified by giving a list of generators and relations. 
For example the cyclic group of order n, denoted by C’,, is specified by 
giving the generator a and relation a” = e. Similarly, the dihedral group 
D,, has two generators a, b and relations a” = e, b? = e, (ab)? = e. 
This group has order 2n. 


14.1.2 Elementary properties 


Here are the basic properties of groups that we need: 


i) 


Subgroups: If a subset of elements of a group forms a group, it is 
called a subgroup. For example, Z 2 has a subgroup of consisting of 
{0,3, 6,9}. Any group G possesses at least two subgroups: the entirety 
of G itself, and the subgroup containing only the identity element {e}. 
These are known as the trivial subgroups. Any other subgroups are 
called proper subgroups. 

Cosets: Given a subgroup H C G, having elements {hj, h2,...}, and 
an element g € G, we form the (left) coset gH = {ghi, gho,...}. If two 
cosets g:H and g2H intersect, they coincide. (Proof: if gihi = g2ha, 
then g2 = gi(hihz') and so g,H = gH.) If H is a finite group, 
each coset has the same number of distinct elements as H. (Proof: if 
gh, = ghz then left multiplication by g~' shows that h,; = hy.) If the 
order of G is also finite, the group G is decomposed into an integer 
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number of cosets, 


where “+” denotes the union of disjoint sets. From this we see that the 
order of H must divide the order of G. This result is called Lagrange’s 
theorem. The set whose elements are the cosets is denoted by G/H. 

iii) Normal subgroups: A subgroup H = {h,hg,...} of G is said to be 
normal, or invariant, if g°'Hg = H for all g € G. This notation means 
that the set of elements g~'Hg = {g~'hig, g~ ‘hag, ...} coincides with 
H, or equivalently that the map h +> g~thg does not take h € H out 
of H, but simply scrambles the order of the elements of H. 

iv) Quotient groups: Given a normal subgroup H, we can define a multi- 
plication rule on the set of cosets G/H = {9,H, goH,...} by taking a 
representative element from each of g;H, and g;H, taking the product 
of these elements, and defining (g;H)(g;H) to be the coset in which this 
product lies. This coset is independent of the representative elements 
chosen (this would not be so were the subgroup not normal). The re- 
sulting group is called the quotient group of G by H, and is denoted by 
G/H. (Note that the symbol “G/H” is used to denote both the set of 
cosets, and, when it exists, the group whose elements are these cosets. ) 

v) Simple groups: A group G with no normal subgroups is said to be sim- 
ple. The finite simple groups have been classified. They fall into various 
infinite families (Cyclic groups, Alternating groups, 16 families of Lie 
type) together with 26 sporadic groups, the largest of which, the Mon- 
ster, has order 808,017,424,794,512,875,886,459,904,961,710,757,005, 754, 
368,000,000,000. The mysterious “Monstrous moonshine” links its rep- 
resentation theory to the elliptic modular function J(7) and to string 
theory. 

vi) Conjugacy and Conjugacy Classes: Two group elements g1, g2 are said 
to be conjugate in G if there is an element g € G such that gz = g~ ‘gig. 
If g, is conjugate to go, we write g; ~ ga. Conjugacy is an equivalence 
relation,” and, for finite groups, the resulting conjugacy classes have 
orders that divide the order of G. To see this, consider the conjugacy 


2 An equivalence relation, ~, is a binary relation that is 
i) Reflexive: A~ A. 
ii) Symmetric: A~B == BHA. 
iii) Transitive: A~ BB, B~C = AxnC 
Such a relation breaks a set up into disjoint equivalence classes. 
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class containing the element g. Observe that the set H of elements 
h € G such that h~'gh = g forms a subgroup. The set of elements 
conjugate to g can be identified with the coset space G/H. The order 
of G divided by the order of the conjugacy class is therefore |]. 


Example: In the rotation group SO(3), the conjugacy classes are the sets of 
rotations through the same angle, but about different axes. 

Example: In the group U(n), of n-by-n unitary matrices, the conjugacy 
classes are the set of matrices possessing the same eigenvalues. 

Example: Permutations. The permutation group on n objects, S;,, has order 
n!. Suppose we consider permutations 71, 72 in Sg such that 7, that maps 


123 45 6 7 8 


me yd AY oh a ae a de 
2.35 1 Sa 7 (6 <8 
and 72 maps 
P22) 3. ok Ae BS FF -38 
mee ME Ghee. ali a of 
DB. Ae er 6 OR BT 
The product 72 0 7, then takes 
Pe Be BAS SB. FB 
Teo Tp She A ve a of 
Oe Ae De 65-5. 78. oF 


We can write these partitions out more compactly by using Paolo Ruffini’s 
cycle notation: 


m1, = (123)(45)(67)(8), 2 = (12345678), 2 0m = (132468) (5)(7). 


In this notation, each number is mapped to the one immediately to its right, 
with the last number in each bracket, or cycle, wrapping round to map to 
the first. Thus 7(1) = 2, (2) = 3, 7(3) = 1. The “8”, being both first 
and last in its cycle, maps to itself: 7(8) = 8. Any permutation with this 
cycle pattern, (« * *)(**)(**)(*), is in the same conjugacy class as 7. We 
say that 7, possesses one 1-cycle, two 2-cycles, and one 3-cycle. The class 
(r1,72,---,Tn) having r, 1-cycles, rp 2-cycles etc., where 71 +2rg+---+nrp = 
n, contains 
n) 


IN (risria. 5) = Se 1 fe Gan) ae GT) 
( 1; Dyes) 17 (71!) Dr2 (ro!) fade “nin (ry!) 
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elements. The sign of the permutation, 


SEN T = €x(1)2(2)n(3)...7,(n) 


is equal to 
Serta Le ay es 


We have, for any two permutations 7, 7, 
sgn (7)sgn (m2) = sgn (771 072), 


so the even (sgna = +1) permutations form an invariant subgroup called 
the Alternating group, A,. The group A, is simple for n > 5, and Ruffini 
(1801) showed that this simplicity prevents the solution of the general quin- 
tic by radicals. His work was ignored, however, and later independently 
rediscovered by Abel (1824) and Galois (1829). 

If we write out the group elements in some order {e, g1, go,...}, and then 
multiply on the left 


gfe, 91, 92,---} = {9,991, 992,---} 


then the ordered list {g, 991, 992,...} is a permutation of the original list. 
Any group G is therefore a subgroup of the permutation group Sjq. This 
result is called Cayley’s Theorem. Cayley’s theorem arguably held up the 
development of group theory for many years by its suggestion that permuta- 
tions were the only groups worthy of study. 


Exercise 14.1: Let H,, Hy be two subgroups of a group G. Show that H,M A» 
is also a subgroup. 


Exercise 14.2: Let G be any group. 


a) The subset Z(G) of G consisting of those g € G that commute with all 
other elements of the group is called the centre of the group. Show that 
Z(G) is a subgroup of G. 

b) If g is an element of G, the set Ce(g) of elements of G that commute 
with g is called the centralizer of g in G. Show that it is a subgroup of 
G. 

c) If H is a subgroup of G, the set of elements of G that commute with 
all elements of H is the centralizer Cg(H) of H in G. Show that it is a 
subgroup of G. 
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d) If H is a subgroup of G, the set Ng(H) C G consisting of those g such 
that g-!Hg = H is called the normalizer of H in G. Show that Nq(H) 
is a subgroup of G, and that H is a normal subgroup of Ng(H). 


Exercise 14.3: Show that the set of powers gj of an element go € G form a 
subgroup. Now, let p be a prime number. Recall that the set {1,2,...p— 
1} forms the group (Z,)* under multiplication modulo p. By appealing to 
Lagrange’s theorem, prove Fermat’s little theorem that for any prime p, and 
positive integer a that is not divisible by p, we have a?~' = 1, modp. (Fermat 
actually used the binomial theorem to show that a? = a, modp for any a — 
divisible by p or not.) 


Exercise 14.4: Use Fermat’s theorem from the previous excercise to establish 
the mathematical identity underlying the RSA algorithm for public-key cryp- 
tography: Let p, q be prime and N = pq. First, use Euclid’s algorithm for the 
highest common factor (HCF) of two numbers to show that if the integer e is 
co-prime to? (p— 1)(q — 1), then there is an integer d such that 


de = 1, mod (p — 1)(q— 1). 
Then show that if, 
C= M*, mod N, (encryption) 


then 
M =C%, mod N. (decryption) 


The numbers e and N can be made known to the public, but it is hard to find 
the secret decoding key, d, unless the factors p and q of N are known. 


Exercise 14.5: Consider the group G with multiplication table shown in Ta- 
ble 14.1. 


Table 14.1: Multiplication table of G. To find AB look in row A column B. 


3Has no factors in common with. 
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This group has proper a subgroup H = {J, A,B}, and corresponding (left) 
cosets are [H = {I, A, B} and CH = {C, D, E}. 


i) Construct the conjugacy classes of this group. 
) Show that {J, A,B} and {C, D, E} are indeed the left cosets of 1. 
iii) Determine whether # is a normal subgroup. 
iv) Ifso, construct the group multiplication table for the corresponding quo- 
tient group. 


il 


Exercise 14.6: Let H and K, be groups. Make the cartesian product G = 
H x K into a group by introducing a multiplication rule « for elements of the 
Cartesian product by setting: 


(hy, k1) * (ha, ko) = (hihe, ki ke). 


Show that G, equipped with * as its product, satsifies the group axioms. The 
resultant group is called the direct product of H and K. 


Exercise 14.7: If F and G are groups, a map y : F — G that preserves the 
group structure, z.e. if y(g1)y~(g2) = (9192), is called a group homomorphism. 
If y is such a homomorphism show that y(er) = eg, where er, and eg are 
the identity element in F’, G respectively. 


Exercise 14.8:. If py : F — G is a group homomorphism, and if we define 
Ker(y) as the set of elements f € F that map to eg, show that Ker(y) is a 
normal subgroup of F’. 


14.1.3. Group actions on sets 


Groups usually appear in physics as symmetries: they act on a physical 
object to change it in some way, perhaps while leaving some other property 
invariant. 

Suppose X is a set. We call its elements “points.” A group action on X 
isamapge€G: X — X that takes a point x € X to a new point that we 
denote by gx € X, and such that go(gir) = (gog1)a, and ex = x. There is 


some standard vocabulary for group actions: 


i) Given a a point « € X we define the orbit of x to be the set Gx 2 


{gi Ge G) CX. 
ii) The action of the group is transitive if any orbit is the whole of X. 
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iii) The action is effective, or faithful, if the map g : X — X being the 
identity map implies that g = e. Another way of saying this is that 
the action is effective if the map G — Map (X — X) is one-to-one. If 
the action of G is not faithful, the set of g € G that act as the identity 
map forms an invariant subgroup H of G, and the quotient group G/H 
has a faithful action. 

iv) The action is free if the existence of an x such that gx = x implies 
that g = e. In this case, we equivalently say that g acts without fixed 
points. 

If the group acts freely and transitively then, having chosen a fiducial 
point 2%, we can uniquely label every point in X by the group element g 
such that x = gzo. (If g; and gp both take x) — x, then gj 'g2%) = 2. By 
the free-action property we deduce that gy'g. = e, and g,; = g.). In this 
case we might, for some purposes, identify X with G. 

Suppose the group acts transitively, but not freely. Let H be the set 
of elements that leaves x9 fixed. This is clearly a subgroup of G, and if 
NX = goto we have g; ‘go € H, or gH = g2H. The space X can therefore 
be identified with the space of cosets G/H. Such sets are called quotient 
spaces or homogeneous spaces. Many spaces of significance in physics can be 
though of as cosets in this way. 

Example: The rotation group SO(3) acts transitively on the two-sphere S?. 
The SO(2) subgroup of rotations about the z axis, leaves the north pole of 
the sphere fixed. We can therefore identify S? ~ SO(3)/SO(2). 

Many phase transitions are a result of spontaneous symmetry breaking. 
For example the water — ice transition results in the continuous translation 
invariance of the liquid water being broken down to the discrete translation 
invariance of the crystal lattice of the solid ice. When a system with symme- 
try group G spontaneously breaks the symmetry to a subgroup H, the set 
of inequivalent ground states can be identified with the homogeneous space 
G/H. 


14.2 Representations 


An n-dimensional representation of a group G is formally defined to be a 
homomorphism from G to a subgroup of GL(n,C), the group of invertible 
n-by-n matrices with complex entries. In effect, it is a set of n-by-n matrices 
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that obey the group multiplication rules 


D(g1)D(g2) = D(gigz), D(g~") = [D(g)]7- (14.2) 


Given such a representation, we can form another one D’(g) by conjuga- 
tion with any fixed invertible matrix C’ 


D'(g) =C'D(g)C. (14.3) 


If D’(g) is obtained from D(g) in this way, we say that D and D’ are equivalent 
representations and write D ~ D’. We can think of D and D’ as being 
matrices representing the same linear map, but in different bases. Our task 
in the rest of this chapter is to find and classify all representations of a finite 
group G, up to equivalence. 


Real and pseudo-real representations 


We can form a new representation from D(g) by setting 
D'(g) = D*(9), 

where D*(g) denotes the matrix whose entries are the complex conjugates 
of those in D(g). Suppose D* ~ D. It may then be possible to find a 
basis in which the matrices have only real entries. In this case we say the 
representation is real. It may be, however, be that D* ~ D but we cannot 
find a basis in which the matrices become real. In this case we say that D is 
pseudo-real. 

Example: Consider the defining representation of SU(2) (the group of 2-by-2 
unitary matrices with unit determinant). Such matrices are necessarily of 


the form 
U= (; = (14.4) 


where a and b are complex numbers with |a|? + |b|? = 1. They are there- 
fore specified by three real parameters, and so the group manifold is three 
dimensional. Now 


I| I| 
Tr 
| 
eH © 
| 
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and so U ~ U*. It is not possible to find a basis in which all SU(2) matrices 
are simultaneously real, however. If such a basis existed then, in that basis, 
a and b would be real,and we could specify the matrices by only two real 
parameters — but we have seen that we need three real numbers to describe 
all possible SU(2) matrices. 


Direct sum and direct product 


We can obtain new representations from old by combining them. 
Given two representations D“(g) and D®)(g), we can form their direct 
sum DY @ D®) as the set of block-diagonal matrices 


(1) 
@ A Pay, (14.6) 


The dimension of this new representation is the sum of the dimensions of the 
two constituent representations. We are particularly interested in taking a 
representation and breaking it up as a direct sum of simpler representations. 

Given two representations D™(g), D®@(g), we can combine them in a 
different way by taking their direct product D™ @ D®), which is the natural 
action of the group on the tensor product of the representation spaces. In 
pas words, if DY (ge? = el D (9) and D)(g)e” = ep (g), we 

efine 


[DY @ DM"(g(e? Bes?) = (ep? Bey )DP(9)Di(g)- (14.7) 
We think of Di yD (g) being the entries in the direct-product matrix 


matrix 
[DY (9) ® D®) (g) kta , 


whose rows and columns are indexed by pairs of numbers. The dimension of 
the product representation is therefore the product of the dimensions of its 
factors. 


Exercise 14.9: Show that if D(g) is a representation, then so is 
D'(g) = [D(g“")I*, 


where the superscript 7’ denotes the transposed matrix. 
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Exercise 14.10: Show that a map that assigns every element of a group G to 
the 1-by-1 identity matrix is a representation. It is, not unreasonably, called 
the trivial representation. 


Exercise 14.11: A representation D : G — GL(n,C) that assigns an element 
g € G to the n-by-n identity matrix J, if and only if g = e is said to be 
faithful. Let D be a non-trivial, but non-faithful, representation of G by n- 
by-n matrices. Let H C G consist of those elements h such that D(h) = In. 
Show that H is a normal subgroup of G, and that D descends to a faithful 
representation of the quotient group G/H. 


Exercise 14.12: Let A and B be linear maps from U — U and let C and D 
be linear maps from V — V. Then the direct products A®C and B® D are 
linear maps from U@ V — U ® V. Show that 


(A®C)(B®@ D) =(AB) @(CD). 


Show also that 
(A@C)(B@ D) = (AB) 6 (CD). 


Exercise 14.13: Let A and B be m-by-m and n-by-n matrices respectively, 
and let J, denote the n-by-n unit matrix. Show that: 
i) tr(A ® B) = tr(A) + tr(B). 
) tr(A ® B) = tr(A) tr(B). 
) exp(A @ B) = exp(A) © exp(B). 
iv) exp(A ® In + Im ® B) = exp(A) © exp(B). 
) det(A © B) = det(A) det(B). 
) det(A ® B) = (det(A))"(det(B))”™. 


14.2.1 Reducibility and irreducibility 


The “atoms” of representation theory are those representations that cannot, 
even by a clever choice of basis, be decomposed into, or reduced to, a direct 
sum of smaller representations. Such a representation is said to be irreducible. 
It is usually not easy to tell just by looking at a representation whether is is 
reducible or not. To do this, we need to develop some tools. We begin with 
a more powerful definition of irreducibilty. 

We first introduce the notion of an invariant subspace. Suppose we have 
a set {Aq} of linear maps acting on a vector space V. A subspace U C V 
is an invariant subspace for the set if  ¢ U => Agwz € U for all Ag. 
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The set {A,} is irreducible if the only invariant subspaces are V itself and 
{0}. Conversely, if there is a non-trivial invariant subspace, then the set* of 
operators is reducible. 

If the A,’s posses a non-trivial invariant subspace U, and we decompose 
V =US8U’, where U’ is acomplementary subspace, then, in a basis adapted 
to this decomposition, the matrices A, take the block-partitioned form of 
figure 14.1. 


AWN. CW 


Ley Hee peers rs 


| 0 _ yy 


S 


Figure 14.1: Block-partitioned reducible matrices. 


If we can find a? complementary subspace U’ that is also invariant, then we 
have the block partitioned form of figure 14.2. 


: 0 fu 


I 


Figure 14.2: Completely reducible matrices. 


We say that such matrices are completely reducible. When our linear op- 
erators are unitary with respect to some inner product, we can take the 
complementary subspace to be the orthogonal complement. This, by uni- 
tarity, is automatically invariant. Thus, unitarity and reducibility implies 
complete reducibility. 


Schur’s lemma 
The most useful results concerning irreducibility come from: 
“Trreducibility is a property of the set as a whole. Any individual matrix always has a 


non-trivial invariant subspace because it possesses at least one eigenvector. 
®'Remember that complementary subspaces are not unique. 
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Schur’s lemma: Suppose we have two sets of linear operators Ag : U — U, 
and B,: V — V, that act irreducibly on their spaces, and an intertwining 
operator A: U — V such that 


A A, = B, A, (14.8) 
for all a, then either 
A) 0, 
or 


b) A is 1-1 and onto (and hence invertible), in which case U and V have 
the same dimension and A, = A7!B,A. 

The proof is straightforward: The relation (14.8) shows that Ker (A) C U and 
Im(A) C V are invariant subspaces for the sets {A,} and {B.} respectively. 
Consequently, either A = 0, or Ker (A) = {0} and Im(A) = V. In the latter 
case A is 1-1 and onto, and hence invertible. 
Corollary: If {A} acts irreducibly on an n-dimensional vector space, and 
there is an operator A such that 


AAg = AdA, (14.9) 


then either A = 0 or A = AI. To see this, observe that (14.9) remains true if 
A is replaced by (A — xl). Now det (A — z/) is a polynomial in x of degree 
n, and, by the fundamental theorem of algebra, has at least one root, x = X. 
Since its determinant is zero, (A — AJ) is not invertible, and so must vanish 
by Schur’s lemma. 


14.2.2 Characters and orthogonality 
Unitary representations of finite groups 


Let G be a finite group and let g +> D(g) bea representation of G by matrices 
acting on a vector space V. Let (x,y) denote a positive-definite, conjugate- 
symmetric, sesquilinear inner product of two vectors in V. From ( , ) we 
construct a new inner product ( , ) by averaging over the group 


(x,y) = a S>(D(g)x, D(g)y). (14.10) 
geG 


It is easy to see that this new inner product remains positive definite, and in 
addition has the property that 


(D(g)x, D(g)y) = (% y)- (14.11) 
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This means that the maps D(g) : V — V are unitary with respect to the 
new product. If we change basis to one that is orthonormal with respect to 
this new product then the D(g) become unitary matrices, with D(g~') = 
D-1(g) = D'(g), where Di, (g) = [Dji(g)]* denotes the conjugate-transposed 
matrix. 

We conclude that representations of finite groups can always be taken 
to be unitary. This leads to the important consequence that for such rep- 
resentations reducibility implies complete reducibility. Warning: In this 
construction it is essential that the sum over the g € G converge. This is 
guaranteed for a finite group, but may not work for infinite groups. In par- 
ticular, non-compact Lie groups, such as the Lorentz group, have no finite 
dimensional unitary representations. 


Orthogonality of the matrix elements 


Now let D/(g) : Vj — Vy be the matrices of an irreducible representation 
or irrep. Here, J is a label that distinguishes inequivalent irreps from one 
another. We will use the symbol dim J to denote the dimension of the rep- 
resentation vector space Vz. 

Let D* be an irrep that is either identical to D’ or inequivalent to it, and 
let M,; be a matrix possessing the appropriate number of rows and columns 
for product D?MD* to be defined, but otherwise arbitrary. The sum 


A= 5° D'(g"')MD*(g) (14.12) 


gEG 


obeys D’(g)A = AD*(g) for any g. Consequently, Schur’s lemma tells us 
that 
Aa = 55 D3 (g™) Mix D&i(g) = ACM) bu d7™. (14.13) 
gEG 
We are here summing over repeated indices, and have written (1) to stress 
that the number 4 depends on the chosen matrix M. Now take M to be zero 
everywhere except for one entry of unity in row j column k. Then we have 


S-Di (g7')DE(g) = Aandi, 67 (14.14) 


gEG 


where we have relabelled \ to indicate its dependence on the location (j, k) 
of the non-zero entry in M. We can find the constants \,;, by assuming that 


572 CHAPTER 14. GROUPS AND GROUP REPRESENTATIONS 


K = J, setting 2 = 1, and summing over 7. We find 


IG O51, => A jk dim J. (14.15) 
Putting these results together we find that 
= 1 
a ~~ Dlg 1) DK (g) = TM (14.16) 
geG 


This matrix-element orthogonality theorem is often called the grand orthog- 
onality theorem because of its utility. 

When our matrices D(g) are unitary, we can write the orthogonality 
theorem in a slightly prettier form: 


1 ; i 
= >, (Dii(9))" Dag) = FA (14.17) 


If we consider complex-valued functions G — C as forming a vector space, 
then the individual matrix entries Dz. are elements of this space and this 
form shows that they are mutually orthogonal with respect to the natural 
sesquilinear inner product. 

There can be no more orthogonal functions on G than the dimension of 


the function space itself, which is |G]. We therefore have a constraint 


S = (dim J)’ < |G| (14.18) 
J 
that places a limit on how many inequivalent representations can exist. In 
fact, as you will show later, the equality holds: the sum of the squares of the 
dimensions of the inequivalent irreducible representations is equal to the or- 
der of G, and consequently the matrix elements form a complete orthonormal 
set of functions on G. 


Class functions and characters 


Because 
te“ DO) =tED: (14.19) 


the trace of a representation matrix is the same for equivalent representations. 
Furthermore, because 


tr D(gy gq) = tr (D~"(91)D(g)D(g1)) = tr D(g), (14.20) 


14.2, REPRESENTATIONS 573 


the trace is the same for all group elements in a conjugacy class. The char- 


acter, 
def 


x(g) = tr D(g), (14.21) 
is therefore said to be a class function. 
By taking the trace of the matrix-element orthogonality relation we see 
that the characters y’ = tr D’ of the irreducible representations obey 


aq ay x0) = a4 (x7) xi = 87%, (14.22) 


where d; is the number of elements in the 7-th conjugacy class. 

The completeness of the matrix elements as functions on G implies that 
the characters form a complete orthonormal set of functions on the space of 
conjugacy classes equipped with inner product 


ef 1 ‘ 
7x7) = ql Coles (14.23) 


Consequently there are exactly as many inequivalent irreducible representa- 
tions as there are conjugacy classes in the group. 

Given a reducible representation, D(g), we can find out exactly which 
irreps J it contains, and how many times, n,, they occur. We do this forming 
the compound character 

x(g) = tr D(g) (14.24) 


and observing that if we can find a basis in which 


D(g) = (D'(g) ® D'(g) ®---) ®(D*(g) ® D*(g) ®---)@++-, (14.25) 
then 
x(g) = mix" (g) + n2x7(g) +++: (14.26) 


From this we find that the multiplicities are given by 
1 * 
ny = (x, x") = aq Sd (xi)* x7. (14.27) 


There are extensive tables of group characters. Table 14.2 shows, for 
example, the characters of the group 5S, of permutations on 4 objects. 
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Typical element and class size 
(12) (123) (1234) (12)(34) 


3 
1 
1 
2 
-1 
-1 


Table 14.2: Character table of S4 


Since y7(e) = dim J we see that the irreps A; and A» are one dimensional, 
that E is two dimensional, and that 7). are both three dimensional. Also 
we confirm that the sum of the squares of the dimensions 


1414+27+374+37?=24=4! 


is equal to the order of the group. 


As a further illustration of how to read table 14.2, let us verify the or- 
thonormality of the characters of the representations 7 and 75. We have 


1 x 1 
(xT, PB) = a didi (x2) yP = agit 3-3-6-1-148-0-0-6-1-143-1-]] = 0, 


1 , 1 
(x7 x7) = aq So (xP) XP = gqll8:3+6-1148-.0046-1-143-11] = 1 


The sum giving (y”, 7) = 1 is identical to this. 


Exercise 14.14: Let D' and D? be representations with characters .!(g) and 
x?(g) respectively. Show that the character of the direct product representa- 
tion D! @ D? is given by 


14.2, REPRESENTATIONS 575 


14.2.3. The group algebra 


Given a finite group G,, we construct a vector space C(G) whose basis vectors 
are in one-to-one correspondence with the elements of the group. We denote 
the vector corresponding to the group element g by the boldface symbol g. 
A general element of C(G) is therefore a formal sum 


X = L181 + L282 + +++ + Lq/Sj\q\- (14.28) 


We take products of these sums by using the group multiplication rule. If 
9192 = 93 we set gi 82 = g3, and require the product to be distributive with 
respect to vector-space addition. Thus 


gx = ©1881 + Mgge +--+ + Licggicy. (14.29) 


The resulting mathematical structure is called the group algebra. It was 
introduced by Frobenius. 

The group algebra, considered as a vector space, is automatically a rep- 
resentation. We define the natural action of G on C(G) by setting 


D(g)gi = Si = Sj) Dji(9). (14.30) 
The matrices D;;(g) make up the regular representation. Because the list 
£21,88,... is a permutation of the list gi, go,..., their matrix entries con- 


sist of 1’s and 0’s, with exactly one non-zero entry in each row and each 
column. 


Exercise 14.15: Show that the character of the regular representation has 
x(e) = |G], and x(g) = 0, for g # e. 


Exercise 14.16: Use the previous exercise to show that the number of times 
an n dimensional irrep occurs in the regular representation is n. Deduce that 
|G| = >, (dim J)*, and from this construct the completeness proof for the 
representations and characters. 


Projection operators 


A representation D’ of the group G automatically provides a representation 
of the group algebra. We simply set 


D! (tigi + ogo +---) © 21D" (gi) + 2D" (g2) te. (14.31) 
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Certain linear combinations of group elements turn out to be very useful 
because the corresponding matrices can be used to project out vectors pos- 
sessing desirable symmetry properties. 

Consider the elements 


ele = Tar [Deol al (14.32) 


geG 
of the group algebra. These have the property that 
dim J 
Bier, = a S° [Dis(9)]" (g18) 


geG 


dim J . 
- Tap os Paalar ‘g)| 8 


geG 
= [Di(90")]" ae S> [Di(9)]"g 
geG 


= e4,D?, (91). (14.33) 
In going from the first to the second line we have changed summation vari- 
ables from g — g, ‘g, and in going from the second to the third line we have 
used the representation property to write D’(g;'g) = D’(g,;')D7(g). 
From gie%, = e%,D%,(g1) and the matrix-element orthogonality, it fol- 
lows that 
dim J 
GY he? |. 7 * OK 
[ng C15 _ IG| > [Dia(9)] & ev5 
geG 


= “ar > [Dis(9)]* DE (ge 


gEG 
= 6X o.6 8% 


= 0 Og es (14.34) 


For each J, this multiplication rule of the e%, is identical to that of matrices 
having zero entries everywhere except for the (a, 3)-th, which is a “1.” There 
are (dim J)? of these ex, for each n-dimensional representation J, and they 
are linearly independent. Because S*>,(dim J)? = |G, they form a basis for 
the algebra. In particular every element of G can be reconstructed as 


g= 5) Di(g)ej,. (14.35) 
J 
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We can also define the useful objects 


P’=S ‘ei = Gy S- [x”(9)]g- (14.36) 
a geG 
They have the property 
Pops =o Pp) Pah (14.37) 
J 


where I is the identity element of C(G). The P’ are therefore projection 
operators composing a resolution of the identity. Their utility resides in the 
fact that when D(g) is a reducible representation acting on a linear space 


V=QV,, (14.38) 
J 


then setting g — D(g) in the formula for P’ results in a projection matrix 
from V onto the irreducible component Vz. To see how this comes about, let 
v € V and, for any fixed p, set 

vi = e,v (14.39) 


up™> 
where e7,v should be understood as shorthand for D(e?,)v. Then 
D(g)vi = geyv = ey, vD%,(9) = v;D¥,(9). (14.40) 


We see that the v;, if not all zero, are basis vectors for Vj. Since P’ is 
a sum of the e,, the vector P’v is a sum of such vectors, and therefore 
lies in V;. The advantage of using P’ over any individual ey, is that P/ 
can be computed from character table, z.e. its construction does not require 
knowledge of the irreducible representation matrices. 


The algebra of classes 


If a conjugacy class C; consists of the elements {g91, 92,..-9a,}, we can define 
C; to be the corresponding element of the group algebra: 


1 


(g1 + go +--+ Ba,)- (14.41) 
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(The factor of 1/d; is a conventional normalization.) Because conjugation 
merely permutes the elements of a conjugacy class, we have g 'Cig = C; 
for all g € C(G). The C; therefore commute with every element of C(G). 
Conversely any element of C(G) that commutes with every element in C(G) 
must be a linear combination: C = cyC,+c2C2+.... The subspace of C(G) 
consisting of sums of the classes is therefore the centre Z|C(G)| of the group 
algebra. Because the product C;C; commutes with every element, it lies in 
Z|C(G)], and so there are constants c;;* such that 


k 
We can regard the C; as being linear maps from Z{C(G)] to itself, whose 
associated matrices have entries (C;)* j= cj". These matrices commute, 


and can be simultaneously diagonalized. We will leave it as exercise for the 
reader to demonstrate that 


E xi J 
CP* = (%) P*;, (14.43) 
XO 
Here yj = Xie} = dim J. The common eigenvectors of the C; are therefore 
the projection operators P’, and the eigenvalues \7/ = x7/y@ are, up to nor- 
malization, the characters. Equation (14.43) provides a convenient method 
for computing the characters from knowledge only of the coefficients c;;" 
appearing in the class multiplication table. Once we have found the eigen- 
values \7, we recover the xj? by noting that y@ is real and positive, and that 


di dilx?)? = |GI. 


Exercise 14.17: Use Schur’s lemma to show that for an irrep D/(g) we have 


1 1 
= S pi epee 
di bah in (9) dim J ouex ) 


and hence establish (14.43). 


14.3 Physics applications 


14.3.1 Quantum mechanics 


When a group G = {g;} acts on a mechanical system, then G will act as set of 
linear operators D(g) on the Hilbert space H of the corresponding quantum 
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system. Thus H will be a representation® space for G. If the group is a 
symmetry of the system then the D(g) will commute with the hamiltonian 
H. If this is so, and if we can decompose 


H= PBA (14.44) 


irreps J 


into H-invariant irreps of G then Schur’s lemma tells us that in each H, the 
hamiltonian A will act as a multiple of the identity operator. In other words 
every state in H, will be an eigenstate of H with a common energy Ey. 

This fact can greatly simplify the task of finding the energy levels. If 
an irrep J occurs only once in the decomposition of 1 then we can find the 
eigenstates directly by applying the projection operator P’ to vectors in H. 
If the irrep occurs nz times in the decomposition, then P’ will project to the 
reducible subspace 


Hy; ®@H; @---Hy =M@Hy,. 
a 


nz copies 


Here M is an ny dimensional multiplicity space. The hamiltonian H will act 
in M as an nj-by-n,z matrix. In other words, if the vectors 


In, i) =|n) ®t) EM BH, (14.45) 


form a basis for M @H,, with n labelling which copy of #1; the vector |n, 7) 
lies in, then 


H\n,i) = |m,i)HZ,,, 
D(g)|n,i) = |n,7)D%¥,(g). (14.46) 


Diagonalizing H/,,, provides us with n,; H-invariant copies of H, and gives 
us the energy eigenstates. 

Consider, for example, the molecule Cg (buckminsterfullerine) consisting 
of 60 carbon atoms in the form of a soccer ball. The chemically active 


electrons can be treated in a tight-binding approximation in which the Hilbert 


6The rules of quantum mechanics only require that D(g1)D(g2) = e?°(9"'9) D(g1g2). 
A set of matrices that obeys the group multiplication rule “up to a phase” is called a 
projective (or ray) representation. In many cases, however, we can choose the D(g) so 
that ¢@ is not needed. This is the case in all the examples we discuss. 
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Figure 14.3: A sketch of the tight-binding electronic energy levels of Co. 


space has dimension 60 — one z-orbital basis state for each each carbon 
atom. The geometric symmetry group of the molecule is Y, = Y x Za, 
where Y is the rotational symmetry group of the icosahedron (a subgroup 
of SO(3)) and Zy is the parity inversion 0: r++ —r. The characters of Y 
are displayed in table 14.3. In this table T = 4(W/5 — 1) denotes the golden 
mean. The class Cs is the set of 27/5 rotations about an axis through the 
centres of a pair of antipodal pentagonal faces, the class C3 is the set of 
of 27/3 rotations about an axis through the centres of a pair of antipodal 
hexagonal faces, and C» is the set of 7 rotatations through the midpoints of 
a pair of antipodal edges, each lying between two adjacent hexagonal faces. 
The geometric symmetry group acts on the 60-dimensional Hilbert space by 
permuting the basis states concurrently with their associated atoms. Figure 
14.3 shows how the 60 states are disposed into energy levels.’ Each level is 
labelled by a lower case letter specifying the irrep of Y, and by a subscript 
g or u standing for gerade (German for even) or ungerade (German for odd) 
that indicates whether the wavefunction is even or odd under the inversion 
O:rhe—r. 

The buckyball is roughly spherical, and the lowest 25 states can be 
thought as being derived from the L = 0,1, 2,3,4, eigenstates, where L is 


7 After R. C. Haddon, L. E. Brus, K. Raghavachari, Chem. Phys. Lett. 125 (1986) 459. 
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Typical element and class size 
Os Ge 05 C3 
12 12 #15 20 


e€ 
1 
1 
3 
3 
4 
5 


Table 14.3: Character table for the group Y. 


the angular momentum quantum number that classifies the energy levels for 
an electron moving on a perfect sphere. In the many-electron ground-state, 
the 30 single-particle states with energy below E < 0 are each occupied by 
pairs of spin up/down electrons. The 30 states with EF > 0 are empty. 


To explain, for example, why three copies of 7; appear, and why two 
of these are T),, and one T),, we must investigate the manner in which the 
60-dimensional Hilbert space decomposes into irreducible representations of 
120-element group Y,;. Problem 14.23 leads us through this computation, 
and shows that no irrep of Y;, occurs more than three times. In finding the 
energy levels, we therefore never have to diagonalize a bigger than 3-by-3 
matrix. 


The equality of the energies of the h, and g, levels at EX = —1 is an 
accidental degeneracy. It is not required by the symmetry, and will presum- 
ably disappear in a more sophisticated calculation. The appearance of many 
“accidental” degeneracies in an energy spectrum hints that there may be a 
hidden symmetry that arises from something beyond geometry. For example, 
in the Schrodinger spectrum of the hydrogen atom all states with the same 
principal quantum number n have the same energy although they correspond 
to different irreps LD = 1,...,n—1 of O(3). This degeneracy occurs because 
the classical Kepler-orbit problem has symmetry group O(4), rather than the 
naively expected O(3) rotational symmetry. 
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14.3.2. Vibrational spectrum of H2,O 


The small vibrations of a mechanical system with n degrees of freedom are 
governed by a Lagrangian of the form 


1 1 
LS 5X Mx — 5x Vx (14.47) 


where M and V are symmetric n-by-n matrices, and with M being positive 
definite. This Lagrangian leads to the equations of motion 


Mk = Vx (14.48) 


We look for normal mode solutions x(t) « e’”*tx;, where the vectors x; obey 


—w? Mx; = Vx;. (14.49) 
The normal-mode frequencies are solutions of the secular equation 
det (V — w?M) = 0, (14.50) 


and modes with distinct frequencies are orthogonal with respect to the inner 
product defined by M, 
(x,y) =x!’ My. (14.51) 


We are interested in solving this problem for vibrations about the equi- 
librium configuration of a molecule. Suppose this equilibrium configuration 
has a symmetry group G. This gives rise to an n-dimensional representation 
on the space of x’s in which 


g: xt D(g)x, (14.52) 
leaves both the intertia matrix MW and the potential matrix V unchanged. 
[D(g)"MD(g)=M, — [D(g)"VD(g) =V. (14.53) 
Consequently, if we have an eigenvector x; with frequency w, 
—w? Mx; = Vx; (14.54) 


we see that D(g)x; also satisfies this equation. The frequency eigenspaces 
are therefore left invariant by the action of D(g), and barring accidental 
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degeneracy, there will be a one-to-one correspondence between the frequency 
eigenspaces and the irreducible representations occurring in D(q). 


Consider, for example, the vibrational modes of the water molecule H2O. 
This familiar molecule has symmetry group C2, which is generated by two 
elements: a rotation a through 7 about an axis through the oxygen atom, 
and a reflection b in the plane through the oxygen atom and bisecting the 
angle between the two hydrogens. The product ab is a reflection in the plane 
defined by the equilibrium position of the three atoms. The relations are 
a? = b? = (ab)? =e, and the characters are displayed in table 14.4. 


class and size 


Table 14.4: Character table of C,. 


The group C2, is Abelian, so all the representations are one dimensional. 


To find out what representations occur when C2, acts, we need to find 
the character of its action D(g) on the nine-dimensional vector 


x = (Oy U6; 20; 0H, URS 2s Eas, Us 2s) (14.55) 


Here the coordinates t7,,Y¥y,,2%H, etc. denote the displacements of the la- 
belled atom from its equilibrium position. 


We take the molecule as lying in the xy plane, with the z pointing towards 
us. 
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Figure 14.4: Water Molecule. 


The effect of the symmetry operations on the atomic displacements is 


D(a)x = (95; YO; —Z20,—L He; +Y Ho; —4Ho, —UH,,; +YH,; —zH, ) 
D(b)x (—2o, +Yo, 20; —X He; +Y He; +2Ho; iL“; +YH,, +2) 
D(ab)x — (+20, +Yo, —Z0, +24; +Yn,; —ZH +2; +Y Hp; —ZH,). 


Notice how the transformations D(a), D(b) have interchanged the displace- 
ment co-ordinates of the two hydrogen atoms. In calculating the character 
of a transformation we need look only at the effect on atoms that are left 
fixed — those that are moved have matrix elements only in non-diagonal 
positions. Thus, when computing the compound characters for a b, we can 
focus on the oxygen atom. For ab we need to look at all three atoms. We 
find 


x"(e) = Y, 

x?(a) = -1+1-1=-1, 

x? (bt) = -1+14+1=1, 

x? (ab) ie as ees ees ee ee oe ee 


By using the orthogonality relations, we find the decomposition 


9 1 1 1 1 

=| 1 1 = = 
ra eer A fee) ee (14.56) 
3 1 i =i 1 


or 
yP = 8x4 + x4? + Qy7t + 3y™. (14.57) 
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Thus, the nine-dimensional representation decomposes as 
D = 3A, ® Ao @ 2B, G 3By. (14.58) 


How do we exploit this? First we cut out the junk. Out of the nine 
modes, six correspond to easily identified zero-frequency motions — three of 
translation and three rotations. A translation in the x direction would have 
Lo = LH, = Ty, = €, all other entries being zero. This displacement vector 
changes sign under both a and 6, but is left fixed by ab. This behaviour 
is characteristic of the representation Bj. Similarly we can identify A, as 
translation in y, and B, as translation in z. A rotation about the y axis 
makes zy, = —zy, = @. This is left fixed by a, but changes sign under b and 
ab, so the y rotation mode is Ag. Similarly, rotations about the x and z axes 
correspond to B, and Bz respectively. All that is left for genuine vibrational 
modes is 2A; @ By. 

We now apply the projection operator 


1 * * 1 * * 
pu= glo") D(e) + (x*"(a))*D(b) + (x7 (6))* DB) + (x (ab))* D(ab)] 
(14.59) 
tO Vx,,2, a small displacement of H, in the x direction. We find 
Ay u 
fe Via = 7 Hy,a — VHo,x — VHo,« a Vinx) 
1 
=: 9 Vine —— Vitor): (14.60) 
This mode is an eigenvector for the vibration problem. 
If we apply P4! to vy, and vo, we find 
PONE 4 = 5/ Hiy + Vip,y)s 
Poa => Veg (14.61) 


but we are not quite done. These modes are contaminated by the y trans- 
lation direction zero mode, which is also in an A, representation. After 
we make our modes orthogonal to this, there is only one left, and this has 
YH, = Yo = —YOMo/(2mMy) = ay, all other components vanishing. 

We can similarly find vectors corresponding to Bz as 


PPvin » = 5 (Vine + Vin) 
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1 
Pes = 5 Hig — Via) 


B 
a eNO = Voy 


and these need to be cleared of both translations in the x direction and 
rotations about the z axis, both of which transform under Bj. Again there 
is only one mode left and it is 


YH, = —YH, = ALY, = AXH, = BX = ag (14.62) 


where a is chosen to ensure that there is no angular momentum about O, 
and (@ to make the total x linear momentum vanish. We have therefore 
found three true vibration eigenmodes, two transforming under A; and one 
under By as advertised earlier. The eigenfrequencies, of course, depend on 
the details of the spring constants, but now that we have the eigenvectors we 
can just plug them in to find these. 


14.3.3 Crystal field splittings 


A quantum mechanical system has a symmetry G if the hamiltonian H obeys 
D"(g)HD(g) = H, (14.63) 


for some group action D(g) : H — H on the Hilbert space. If follows that 
the eigenspaces, 7), of states with a common eigenvalue, A, are invariant 
subspaces for the representation D(q). 

We often need to understand how a degeneracy is lifted by perturbations 
that break G down to a smaller subgroup H. An n-dimensional irreducible 
representation of G is automatically a representation of any subgroup of G, 
but in general it is no longer be irreducible. Thus the n-fold degenerate 
level is split into multiplets, one for each of the irreducible representations 
of H contained in the original representation. The manner in which an orig- 
inally irreducible representation decomposes under restriction to a subgroup 
is known as the branching rule for the representation. 

A physically important case is given by the breaking of the full SO(3) 
rotation symmetry of an isolated atomic hamiltonian by a crystal field Sup- 
pose the crystal has octohedral symmetry. The characters of the octohedral 
group are displayed in table 14.5. 
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Table 14.5: Character table of the octohedral group O. 


The classes are lableled by the rotation angles, C2 being a twofold rotation 
axis (9 = 7), C3 a threefold axis (0 = 27/3), etc.. 
The chacter of the J =I representation of SO(3) is 


sin(2/ + 1)0/2 


Oa ae (14.64) 


and the first few y'’s evaluated on the rotation angles of the classes of O are 
dsiplayed in table 14.6. 


Class(size) 
C3(8)_C7(3)_C2(6)__C,(6) 
1 


Table 14.6: Characters evaluated on rotation classes 


The 9-fold degenerate | = 4 multiplet therefore decomposes as 


(14.65) 


PRR Ofo 
II 
See ee 
eft 
Cow Fr Pw 


3 

0 

1S 
—1 

1 


er Fe © Ww 
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or 
xs0(a) =X + XP +X + x”. (14.66) 


The octohedral crystal field splits the nine states into four multiplets with 
symmetries A,, FE, F\, Fo and degeneracies 1, 2, 3 and 3, respectively. 

We have considered only the simplest case here, ignoring the complica- 
tions introduced by reflection symmetries, and by 2-valued spinor represen- 
tations of the rotation group. 


14.4 Further exercises and problems 


We begin with some technologically important applications of group theory 
to cryptography and number theory. 


Exercise 14.18: The set Z,, forms a group under multiplication only when n is 
a prime number. Show, however, that the subset U(Z,) C Z, of elements of 
Zy that are co-prime to n is a group. It is the group of units of the ring Zp. 


Exercise 14.19: Cyclic groups. A group G is said to be cyclic if its elements 
consist of powers a” of of an element a, called the generator. The group will 
be of finite order |G| = m if a” = a° =e for some m € Z*. 


a) Show that a group of prime order is necessarily cyclic, and that any 
element other than the identity can serve as its generator. (Hint: Let 
a be any element other than e and consider the subgroup consisting of 
powers a™.) 

b) Show that any subgroup of a cyclic group is itself cyclic. 


Exercise 14.20: Cyclic groups and cryptography. In a large cyclic group G 
it can be relatively easy to compute a”, but to recover x given h = a” one 
might have to compute a’ and compare it with h for every 1 < y < |G|. If 
|G| has several hundred digits, such a brute force search could take longer 
than the age of the universe. Rather more efficient algorithms for this discrete 
logarithm problem exist, but the difficulty is still sufficient for it to be useful 
in cryptopgraphy. 

a) Diffie-Hellman key exchange. This algorithm allows Alice and Bob to 
establish a secret key that can be used with a conventional cypher with- 
out Eve, who is listening to their conversation, being able to reconstruct 
it. Alice choses a random element g € G and an integer x between 1 and 
|G| and computes g”. She sends g and g* to Bob, but keeps « to herself. 


14.4. FURTHER EXERCISES AND PROBLEMS 589 


Bob chooses an integer y and computes g¥ and g*¥ = (g”)¥. He keeps 
y secret and sends g¥ to Alice, who computes g*¥ = (g¥)*. Show that, 
although Eve knows g, g¥ and g*, she cannot obtain Alice and Bob’s 
secret key g*¥ without solving the discrete logarithm problem. 

b) ElGamal public key encryption. This algorithm, based on Diffie-Hellman, 
was invented by the Egyptian cryptographer Taher Elgamal. It is a 
component of PGP and and other modern encryption packages. To use 
it, Alice first chooses a random integer x in the range 1 to |G| and 
computes h = a*”. She publishes a description of G, together with the 
elements h and a, as her public key. She keeps the integer x secret. To 
send a message m to Alice, Bob chooses an integer y in the same range 
and computes c, = a¥, co = mh¥. He transmits cy; and co to Alice, but 
keeps y secret. Alice can recover m from cj, cg by computing c2(c?)~}. 
Show that, although Eve knows Alice’s public key and has overheard c 
and cg, she nonetheless cannot decrypt the message without solving the 
discrete logarithm problem. 


Popular choices for G are subgroups of (Z,)*, for large prime p. (Z,)” is itself 
cyclic (can you prove this?), but is unsuitable for technical reasons. 


Exercise 14.21: Modular arithmetic and number theory. An integer a is said 
to be a quadratic residue mod p if there is an r such that a = r? (mod p). 
Let p be an odd prime. Show that if r? = r3 (mod p) then r; = +r2 (mod p), 
and that r 4 —r (mod p). Deduce that exactly one half of the p— 1 non-zero 
elements of Z, are quadratic residues. 


Now consider the Legendre symbol 


ie 0, a=0, 
(<) ss 1, aa quadratic residue (mod p), 
—1 a not a quadratic residue (mod p). 


G)G)-G) 


and so the Legendre symbol forms a one-dimensional representation of the 
multiplicative group (Z,)*. Combine this fact with the character orthogonality 
theorem to give an alternative proof that precisely half the p— 1 elements of 
(Z»)* are quadratic residues. (Hint: To show that the product of two non- 
residues is a residue, observe that the set of residues is a normal subgroup of 
(Z,)*, and consider the multiplication table of the resulting quotient group.) 


Show that 
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Exercise 14.22: More practice with modular arithmetic. Again let p be an 
odd prime. Prove Euler’s theorem that 


a®-)/2 (mod p) = (<). 
P 


(Hint: Begin by showing that the usual school-algebra proof that an equa- 
tion of degree n can have no more than n solutions remains valid for arith- 
metic modulo a prime number, and so a'?—!)/2 = 1 (mod p) can have no more 
than(p — 1)/2 roots. Cite Fermat’s little theorem to show that these roots 
must be the quadratic residues. Cite Fermat again to show that the quadratic 
non-residues must then have a'?—))/2 = —1 (mod p).) 


The harder-to-prove law of quadratic reciprocity asserts that for p,q odd primes, 


we have 
(-1)e-Da-v/A () = (2). 
q p 


Problem 14.23: Buckyball spectrum. Consider the symmetry group of the C¢o 
buckyball molecule of figure 14.3. 


a) Starting from the character table of the orientation-preserving icosohe- 
dral group Y (table 14.3), and using the fact that the Z2 parity inversion 
o : r — —r combines with g € Y so that D’7(ag) = D’s(g), whilst 
D’«(ag) = —D%«(g), write down the character table of the extended 
group Y;, = Y x Ze that acts as a symmetry on the Cgg molecule. There 
are now ten conjugacy classes, and the ten representations will be la- 
belled Ag, Ay, etc. Verify that your character table has the expected 
row-orthogonality properties. 

b) By counting the number of atoms left fixed by each group operation, 
compute the compound character of the action of Y; on the Cgg molecule. 
(Hint: Examine the pattern of panels on a regulation soccer ball, and 
deduce that four carbon atoms are left unmoved by operations in the 
class oC.) 

c) Use your compound character from part b), to show that the 60-dimensional 
Hillbert space decomposes as 


Heo = Ag ® Tig ® 2Ty @ Tg @ 2T oy, © 2G <p) 2G, @ 3H, i) 2H, 
consistent with the energy-levels sketched in figure 14.3. 


Problem 14.24: The Frobenius-Schur Indicator. Recall that a real or pseudo- 
real representation is one such that D(g) ~ D*(g), and for unitary matrices D 
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we have D*(g) = [D™(g)|~1. In this unitary case D(g) being real or pseudo- 
real is equivalent to the statement that there exists an invertible matrix F’ 
such that 


FD(g)F* =[D*(g)\". 


We can rewrite this statement as D'(g)FD(g) = F, and so F can be inter- 
preted as the matrix representing a G-invariant quadratic form. 


i) 


iii) 


iv) 


Use Schur’s lemma to show that when D is irreducible the matrix F is 
unique up to an overall constant. In other words, D™(g)F,D(g) = Fi 
and D?(g)F2D(g) = F> for all g € G implies that Fy = AF,. Deduce 
that for irreducible D we have F? = +F. 

By reducing F’ to a suitable canonical form, show that F’ is symmetric 
(F = FT) in the case that D(g) is a real representation, and F is skew 
symmetric (F = —F7) when D(g) is a pseudo-real representation. 

Now let G be a finite group. For any matrix U, the sum 


np S> D*(g)UD(g) 
IG] = 


is a G-invariant matrix. Deduce that Fy is always zero when D(g) is 
neither real nor pseudo-real, and, by specializing both U and the indices 
on Fy, show that in the real or pseudo-real case 


So x9?) = +95 x(9)x(9); 


where y(g) = tr D(g) is the character of the irreducible representation 
D(g). Deduce that the Frobenius-Schur indicator 


def 1 
Fo gq ex) 


gEG 


takes the value +1, —1, or 0 when D(q) is, respectively, real, pseudo-real, 
or not real. 

Show that the identity representation occurs in the decomposition of the 
tensor product D(g) ® D(g) of an irrep with itself if, and only if, D(g) 
is real or pseudo-real. Given a basis e; for the vector space V on which 
D(g) acts, show the matrix F' can be used to construct the basis for the 
identity-representation subspace V“ in the decomposition 


V@eV= BD ve 


irreps J 
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Problem 14.25: Induced Representations. Suppose we know a representation 
DW (h) : W — W for a subgroup H C G. From this representation we can 
construct an induced representation Inde (Dp) for the larger group G. The 
construction cleverly combines the coset space G/H with the representation 
space W to make a (usually reducible) representation space IndG(W) of di- 
mension |G/H| x dimW. 

Recall that there is a natural action of G on the coset space G/H. If « = 
{g1,92,..-} € G/H then gz is the coset {gg1,g992,...}. We select from each 
coset « € G/H a representative element a,, and observe that the product gaz 
can be decomposed as gaz = dgzh, where dgz is the selected representative 
from the coset gx and h is some element of H. Next we introduce a basis 
|n, x) for IndG(W). We use the symbol “0” to label the coset {e}, and take 
|n,0) to be the basis vectors for W. For h € H we can therefore set 


D(h)|n,0) = |r, 0) Dyn (h) 
We also define the result of the action of a; on |n,0) to be the vector |n, x): 
D(az)|n, 0) & = \n, i). 


We may now obtain the the action of a general element of G on the vectors 
|n, x) by requiring D(g) to be representation, and so computing 


D(g)\n,z) = D(g)D(az)|n, 0) 
= D(gaz)|n,0) 
= D(agxh)|n, 0) 
= D(dgx)D(h)|n, a 
= D(a ge) |r| 0) Dyin (h) 
) Dyin (h). 


i) Confirm that the action D(g)|n,x) = |m,gx) DW (h), with h obtained 
from g and x via the decomposition gay = ag,h, does indeed define a 
representation of G. Show also that if we set |f) = >7,. fn(x)|n,2), 
then the action of g on the components takes 


f(t) > Drim(h) fn(g” *2): 


ii) Let f(h) be a class function on H. Let us extend it to a function on G 
by setting f(g) = 0 if g ¢ H, and define 


Ind& [f i] a f(g 


geG 


=| pigs 
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Show that Ind&[f](s) is a class function on G, and further show that if 
xw is the character of the starting representation for H then IndG [yw] 
is the character of the induced representation of G. (Hint, only fixed 
points of the G-action on G/H contribute to the character, and gx = x 
means that gaz = azh. Thus DW (h) = D“ (az! gaz).) 

iii) Given a representation DY (g) : V — V of G we can trivially obtain a 
(generally reducible) representation Res(V) of H C G by restricting G 
to H. Define the usual inner product on the group functions by 


(b1, d2)q = a S> b1(97")oa2(g), 


gEG 


and show that if ~ is a class function on H and ¢ a class function on G 
then 


(w, Res#[¢]) 7, = (Indi [y], ¢) ¢. 


Thus, Ind? and Res$, are, in some sense, adjoint operations. Mathe- 
maticians would call them a pair of mutually adjoint functors. 

iv) By applying the result from part (iii) to the characters of the irreducible 
representations of G and H, deduce Frobenius’ reciprocity theorem: The 
number of times an irrep D/(g) of G occurs in the representation induced 
from an irrep D* (h) of H is equal to the number of times that D*® occurs 
in the decomposition of D’ into irreps of H. 


The representation of the Poincaré group (= the SO(1,3) Lorentz group to- 
gether with space-time translations) that classifies the states of a spin-J ele- 
mentary particle are those induced from the spin-J representation of its SO(3) 
rotation subgroup. The quantum state of a mass m elementary particle is 
therefore of the form |k,o) where k is the particle’s four-momentum, which 
lies is the coset SO(1,3)/SO(3), and a is the label from the |J,o) spin state. 
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Chapter 15 


Lie Groups 


A Lie group is a group which is also a smooth manifold G. The group 
operation of multiplication (91, 92) > g3 is required to be a smooth function, 
as is the operation of taking the inverse of a group element. Lie groups are 
named after the Norwegian mathematician Sophus Lie. The examples most 
commonly met in physics are the infinite families of matrix groups GL(n), 
SL(n), O(n), SO(n), U(n), SU(n), and Sp(n), all of which we shall describe 
in this chapter, togther with the family of five exceptional Lie groups: Go, 
Fy, Eg, Ey, and Eg, which have applications in string theory. 

One of the properties of a Lie group is that, considered as a manifold, 
the neighbourhood of any point looks exactly like that of any other. Accord- 
ingly, the group’s dimension and much of its structure can be understood by 
examining the immediate vicinity any chosen point, which we may as well 
take to be the identity element. The vectors lying in the tangent space at 
the identity element make up the Lie algebra of the group. Computations in 
the Lie algebra are often easier than those in the group, and provide much of 
the same information. This chapter will be devoted to studying the interplay 
between the Lie group itself and this Lie algebra of infinitesimal elements. 


15.1 Matrix groups 


The Classical Groups are described in a book with this title by Hermann 
Weyl. They are subgroups of the general linear group, GL(n, F), which con- 
sists of invertible n-by-n matrices over the field F. We will mostly consider 
the cases F = C or F=R. 
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A near-identity matrix in GL(n, R) can be written g = J+ €A, where A 
is an arbitrary n-by-n real matrix. This matrix contains n? real entries, so 
we can move away from the identity in n? distinct directions. The tangent 
space at the identity, and hence the group manifold itself, is therefore n? 
dimensional. The manifold of GL(n, C) has n? complex dimensions, and this 
corresponds to 2n? real dimensions. 

If we restrict the determinant of a GL(n,F) matrix to be unity, we get 
the special linear group, SL(n, F). An element near the identity in this group 
can still be written as g = J + €A, but since 


det (I + €A) = 1+ etr(A) + O(e?), (15.1) 


this requires tr(A) = 0. The restriction on the trace means that SL(n, R) 
has dimension n? — 1. 


15.1.1 The unitary and orthogonal groups 


Perhaps the most important of the matrix groups are the unitary and or- 
thogonal groups. 


The unitary group 


The unitary group U(n) comprises the set of n-by-n complex matrices U such 
that Ut = U—!. If we consider matrices near the identity 


U=1+eA, (15.2) 


with ¢€ real, then unitarity requires 


T+O(e) = (I+6A)(I+6A') 
T+e(A+ Al) +0(e), (15.3) 
so Aj; = —(A;;)* —it ie. A is skew hermitian. A complex skew-hermitian 


matrix contains F 
ene E x ann 1)| =? 


real parameters. In this counting, the first “n” is the number of entries on 
the diagonal, each of which must be of the form 7 times a real number. The 
n(n — 1)/2 is the number of entries above the main diagonal, each of which 
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can be an arbitrary complex number. The number of real dimensions in the 
group manifold is therefore n?.. As UtU = I, the rows or columns in the 
matrix U form an orthonormal set of vectors. Their entries are therefore 
bounded, |U;;| < 1, and this property leads to the n? dimensional group 
manifold of U(n) being a compact set. 

When a group manifold is compact, we say that the group itself is a 
compact group. There is a natural notion of volume on a group manifold 
and compact Lie groups have finite total volume. Because of this, they have 
many properties in common with the finite groups we studied in the previous 
chapter. 

Recall that a group is simple if it possesses no invariant subgroups. U(n) 
is not simple. Its centre Z is an invariant U(1) subgroup consisting of matrices 
of the form U = eI. The special unitary group SU(n), consists of n-by- 
n unimodular (having determinant +1) unitary matrices. It is not strictly 
simple because its center Z consists of the discrete subgroup of matrices 
Um =w™ I with w an n-th root of unity, and this is an invariant subgroup. 
Because the centre, its only invariant subgroup, is not a continuous group, 
SU(n) is counted as being simple in Lie theory. With U = I + €A, as above, 
the unimodularity imposes the additional constraint on A that tr A = 0, so 
the SU(n) group manifold is n? — 1 dimensional. 


The orthogonal group 


The orthogonal group O(n) consists of the the set of real matrices O with the 
property that OT = O~!. For a matrix in the neighbourhood of the identity, 
O =I1+ «A, this property requires that A be skew symmetric: Aj; = —Aj;. 
Skew symmetric real matrices have n(n—1)/2 independent entries, and so the 
group manifold of O(n) is n(n — 1)/2 dimensional. The condition OTO = I 
means that the rows or columns of O, considered as row or column vectors, 
are orthonormal. All entries are bounded, i.e. |O;;| < 1, and again this leads 
to O(n) being a compact group. 
The identity 


1 = det (O70) = det O' det O = (det O)? (15.4) 


tells us that det O = +1. The subset of orthogonal matrices with det O = +1 
constitute a subgroup of O(n) called the special orthogonal group SO(n). The 
unimodularity condition discards a disconnected part of the group manifold 
and does not reduce its dimension, which remains n(n — 1)/2. 
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15.1.2 Symplectic groups 


The symplectic groups (named from the Greek, meaning to “fold together” ) 
are perhaps less familiar than the other matrix groups. 

We start with a non-degenerate skew-symmetric matrix w. The symplec- 
tic group Sp(2n, F) is then defined by 


Sp(2n, F) = {9 € GL(2n,F) : S?wS = w}. (15.5) 


Here F can be R or C. When F = C, we still use the transpose “TJ,” not 
the adjoint “},” in this definition. Setting S = Jz, + ¢A and demanding that 
S?wS = w shows that ATw +wA = 0. 

It does not matter what skew matrix w we start from, because we can 
always find a basis in which w takes its canonical form: 


w= @ a) ; (15.6) 


In this basis we find, after a short computation, that the most general form 


for A is 
A=(% or (15.7) 


Here, a is any n-by-n matrix, and b and ¢ are symmetric (i.e. b’ = b and 


c! = c) n-by-n matrices. If the matrices are real, then counting the degrees 


of freedom gives the dimension of the real symplectic group as 


dim Sp(2n, R) = n? + E x s(n + )| =n(2n +1). (15.8) 


The entries in a,b,c can be arbitrarily large. Sp(2n,R) is not compact. 
The determinant of any symplectic matrix is +1. To see this take the 
elements of w to be w,;, and let 


wig) = wix'y? (15.9) 


be the associated skew bilinear (not sesquilinear) form . Then Weyl’s identity 
from exercise A.19 shows that 


Pf (w) (det M) det |21,..., Lan! 


1 
onl S- sgn (1)w(Maq1), Mz,(2)) pay w(Man2n-1); Mzz(2n)), 
: TE San 


15.1. MATRIX GROUPS 599 


for any linear map M. If w(z,y) = w(Mz, My), we conclude that det M = 
1 — but preserving w is exactly the condition that M be an element of 
the symplectic group. Since the matrices in Sp(2n,F) are automatically 
unimodular there is no “special symplectic” group. 


Unitary symplectic group 


The intersection of two groups is also a group. We therefore define the unitary 
symplectic group as 
Sp(n) = Sp(2n, C) N U(2n). (15.10) 

This group is compact — a property in inherits from the compactness of the 
U(n) in which it is embedded as a subgroup. We will see that its dimension 
is n(2n + 1), the same as the non-compact Sp(2n,R). Sp(n) may also be 
defined as U(n, H) where H denotes the skew field of quaternions. 
Warning: Physics papers often make no distinction between Sp(n), which 
is a compact group, and Sp(2n,R) which is non-compact. To add to the 
confusion the compact Sp(n) is also sometimes called Sp(2n). You have to 
judge from the context what group the author has in mind. 
Physics Application: Kramers’ degeneracy. 

Let o; be the Pauli matrices, and L the orbital angular momentum oper- 
ator. The matrix C’ = 7@2 has the property that 


C16,C = -G. (15.11) 


A time-reversal invariant Hamiltonian containing L-S spin-orbit interactions 
obeys 

GHC SA! (15:19) 
We regard the 2n-by-2n matrix H as being an n-by-n matrix whose entries 
H,; are themselves 2-by-2 matrices. We therefore expand these entries as 


3 
_ 20 > na 
n=1 


The condition (15.12) now implies that the h¢?; are real numbers. We therefore 
say that H is real quaternionic. This is because the Pauli sigma matrices are 
algebraically isomorphic to Hamilton’s quaternions under the identification 
i071 — i, 
102 j, (15.13) 
103 o k. 
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The hermiticity of H requires that H;; = Hi; where the overbar denotes 
quaternionic conjugation, 7.e. the mapping 


gq tig’, + ig’G2 + ig?a3 — g° — iq’ G, — ig’G2 — iq?Gs. (15.14) 


If Hy = Ey, then HCy* = Ey". Since C is skew, ~ and C'y* are necessarily 
orthogonal. Therefore all states are doubly degenerate. This is Kramers’ 
degeneracy. 

H may be diagonalized by a matrix in U(n,H), where U(n,H) consists 
of those elements of U(2n) that satisfy C-'UC = U*. We may rewrite this 
condition as 


C1UC =U* > UCU =C, 


so U(n,H) consists of the unitary matrices that preserve the skew symmet- 
ric matrix C. Thus U(n,H) C Sp(n). Further investigation shows that 
U(n, H) = Sp(n). 

We can exploit the quaternionic viewpoint to count the dimensions. Let 
U = I+eB bein U(n, H). Then B;;+B;; = 0. The diagonal elements of B are 
thus pure “imaginary” quaternions having no part proportional to J. There 
are therefore 3 parameters for each diagonal element. The upper triangle has 
n(n — 1)/2 independent elements, each with 4 parameters. Counting up, we 
find 


dim U(n, Hl) = dim Sp(n) = 3n+ [4x $(n—1)] = nn +1). (15.15) 


Thus, as promised, we see that the compact group Sp(n) and the non- 
compact group Sp(2n, IR) have the same dimension. 

We can also count the dimension of Sp(n) by looking at our previous 
matrices 


where a 6 and ¢ are now allowed to be complex, but with the restriction that 
S =I+€A be unitary. This requires A to be skew-hermitian, so a = —a', 
and c = —b', while b (and hence c) remains symmetric. There are n? free 
real parameters in a, and n(n + 1) in b, so 


dim Sp(n) = (n?) + n(n +1) = n(2n +1), 


as before. 
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Exercise 15.1: Show that 
SO(2N)  Sp(2N,R) = U(N). 


Hint: Group the 2/N basis vectors on which O(2N) acts into pairs x, and yn, 
n=41,...,N. Assemble these pairs into z, =X, +7y, and Z= x, —iyn. Let 
w be the linear map that takes x, — yy, and yn — —xy. Show that the subset 
of SO(2N) that commutes with w mixes z,’s only with z;’s and z,’s only with 
Zi’S. 


15.2 Geometry of SU(2) 


To get a sense of Lie groups as geometric objects, we will study the simplest 
non-trivial case, SU(2), in some detail. 
A general 2-by-2 complex matrix can be parametrized as 


eo +isk iat +2? 
U= & seam (15.16) 
The determinant of this matrix is unity, provided 


(0°)? + (a2)? + (27)? + (2°)? = 1. (15.17) 


When this condition is met, and if in addition the «’ are real, the matrix is 
unitary: U' = U~1. The group manifold of SU(2) can therefore be identified 
with the three-sphere $?. We will take as local co-ordinates x1, x”, 73. When 
we desire to know x° we will find it from 2° = ,/1 — (x!)2 — (x2)? — (#3)?. 
This co-ordinate chart only labels the points in the half of the three-sphere 
having x° > 0, but this is typical of any non-trivial manifold. A complete 
atlas of charts can be constructed if needed. 
We can simplify our notation by using the Pauli sigma matrices 


Be 0 1 oe 0 -1 we 1 0 
a= (4 a m= () ay = (4 one (15.18) 
These obey 
(i, a5] = 21€ijKCk; and O;0; + O50; = DOpgl. (15.19) 


In terms of them, we can write the general element of SU(2) as 


g=U H=a2°l + ino, + in + it?s. (15.20) 
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Elements of the group in the neighbourhood of the identity differ from e = J 
by real linear combinations of the ia;. The three-dimensional vector space 
spanned by these matrices is therefore the tangent space T’G, at the identity 
element. For any Lie group, this tangent space is called the Lie algebra, 
g = LieG of the group. There will be a similar set of matrices 7A; for any 
matrix group. They are called the generators of the Lie algebra, and satisfy 
commutation relations of the form 


[Xi Ag] = — fig *(@Ax)s (15.21) 


or equivalently Be 2 oe 
[Ais Ag] = ify Ak 22) 


6699 9 


The ee are called the structure constants of the algebra. The “i”’s associ- 


ated with the \’s in this expression are conventional in physics texts because 
for quantum mechanics application we usually desire the \; to be hermitian. 
They are usually absent in books aimed at mathematicians. 


Exercise 15.2: Let M1 and re! be hermitian matrices. Show that if we define 
oe by the relation Pas de] = = irs, then Ag is also a hermitian matrix. 


Exercise 15.3: For the group O(n) the matrices “i\” are real n-by-n skew 
symmetric matrices A. Show that if A; and A» are real skew symmetric 
matrices, then so is [Aj, Ag]. 


Exercise 15.4: For the group Sp(2n,R) the id matrices are of the form 


a b 
4=(0 ar) 


where a is any real n-by-n matrix and b and ¢ are symmetric (a? = a and 
b” = b) real n-by-n matrices. Show that the commutator of any two matrices 
of this form is also of this form. 


15.2.1 Invariant vector fields 


Consider a matrix group, and in it a group element J + TO lying close to 
the identity e = J. Draw an arrow connecting J to J + 7e\;, and regard 
this arrow as a vector L; lying in TG,. Next, map the infinitesimal element 
I +ieA; to the neighbourhood an arbitrary group element g by multiplying 
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on the left to get g(I + ied;). By drawing an arrow from g to g(J + ied), we 
obtain a vector L;(g) lying in TG,. This vector at g is the push-forward of 
the vector at e by left multiplication by g. For example, consider SU(2) with 
infinitesimal element J + 7603. We find 


gl + 1€03) = (2° + ix Gy + ix? Go + ix®@3) (I + 1€03) 
= (x° —ex*) + iG,(x! — ex?) + 1Go(x? + ex!) + 163(x? + ex). 
(15:23) 
This computation can also be interpreted as showing that the multiplication 


of g € SU(2) on the right by (I + ie@3) displaces the point g, changing its x’ 
parameters by an amount 


x =p2 
1 8 

, i. =e ei (15.24) 
x x? 


Knowing how the displacement looks in terms of the x!, x”, x? co-ordinate 


system lets us read off the 0/0x" components of a vector L3 lying in TG,: 
L3 = —270, + 2' Oo + 0°03. (15.25) 


Since g can be any point in the group, we have constructed a globally-defined 
vector field L3 that acts on a function F'(g) on the group manifold as 


Ekg) = lis . Rie) = Fol} | (15.26) 


e-0 | € 
Similarly, we obtain 


Ly = ©); = x Op + x03 
Ly = LO + x05 = x'Os. (15.27) 


The vector fields LD; are said to be left invariant because the push-forward 
of the vector L;(g) lying in the tangent space at g by multiplication on the 
left by any g’ produces a vector g/|L;(g)] lying in the tangent space at g’g, 
and this pushed-forward vector coincides with the L;(g'g) already there. We 
can express this statement tersely as g,L; = Lj. 
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Using 0;2° = —x'/x9, 1 = 1,2,3, we can compute the Lie brackets and 
find 
[Li, Do] —_ —2D 3. (15.28) 
In general 
[Li, Ts = —2éjnLe, (15.29) 


which coincides with the matrix commutator of the ia;. 

This construction works for all Lie groups. For each basis vector L; in the 
tangent space at the identity e, we push it forward to the tangent space at g 
by left multiplication by g, and so construct the global left-invariant vector 
field L;. The Lie bracket of these vector fields will be 


[Li, Lj] = —f,;"Le, (15.30) 


where the coefficients Ee are guaranteed to be position independent because 
(see exercise 12.5) the operation of taking the Lie bracket of two vector fields 
commutes with the operation of pushing-forward the vector fields. Con- 
sequently, the Lie bracket at any point is just the image of the Lie bracket 
calculated at the identity. When the group is a matrix group, this Lie bracket 
will coincide with the commutator of the i;, that group’s analogue of the 
io; matrices. 


The exponential map 


Recall that given a vector field X = XO, we define associated flow by 
solving the equation 


da" = X*(a(t)). (15.31) 


If we do this for the left-invariant vector field L, with initial condition 
x(0) = e, we obtain a t-dependent group element g(x(t)), which we denote 
by Exp (tZ). The symbol “Exp” stands for the exponential map which takes 
elements of the Lie algebra to elements of the Lie group. The reason for the 
name and notation is that for matrix groups this operation corresponds to 
the usual exponentiation of matrices. Elements of the matrix Lie group are 
therefore exponentials of matrices in the the Lie algebra. To see this, suppose 
that L; is the left invariant vector field derived from 7\;. ‘Then the matrix 


~ Be lee MDa 
g(t) = exp(itrs) = I+ ithy — 5X; — ig? N foe. (15.32) 
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is an element of the group, and 


a 


g(t + €) = exp(itA;) exp(ied;) = g(t) (1 Pint o(e)) (15.33) 
From this we deduce that 


Sate) = lim | “lalt\(1 + i.) — a(t] = Lia (15.34) 


Since exp (itd) = I when t = 0, we deduce that Exp (tL;) = exp(itd;). 


Right-invariant vector fields 


We can also use multiplication on the right to push forward an infinitesimal 
group element. For example: 


(I+%e63)9 = (I+ icG3)(x° + ix'G, + ix?G2 + ix®G3) 
= (2° —ex*)+iG,(2' + ex?) + iGo(2? — ex') + 163 (x? + ex”). 
(15.35) 
This motion corresponds to the right-invariant vector field 
Rs = xO; = x Oy + £°O3. (15.36) 
Similarly, we obtain 
Ry = vO: Te x 0o = £703, 
Ro _ —730, ale £°O> + £03, (15.37) 
and find that 
[Ri, Re] = +2Rs3. (15.38) 
In general, 


For any Lie group, the Lie brackets of the right-invariant fields will be 
[Ri, Rj] =+fij"Re (15.40) 


whenever 
[Ls, D5] = — fig" Ls (15.41) 
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are the Lie brackets of the left-invariant fields. The relative minus sign be- 
tween the bracket algebra of the left- and right-invariant vector fields has 
the same origin as the relative sign between the commutators of space- and 
body-fixed rotations in classical mechanics. Because multiplication from the 
left does not interfere with multiplication from the right, the left and right 
invariant fields commute: 


[Li, Ry] = 0. (15.42) 


15.2.2. Maurer-Cartan forms 


Suppose that g is an element of a group and dg denotes its exterior derivative. 
Then the combination dg g~! is a Lie-algebra-valued one form. For example, 
starting from the elements of SU(2) 


g = x? +ixG, + ix?G, + irs, 
gi=g = 2#° -ir'G, —ix’?G, — ix?G, (15.43) 


we compute 


dg = dx +idz'G, + idx’, + idx®G3 
(2°) 1 (—a' dx! — a?dx* — x? dx®) + idx, + idx? + idxr*G3. 
(15.44) 


From this we find 


dgg* = iG, ((x° + (2)? /x°)da! + (2° + (aa?) /2°)dax? + (—2? + (aa) /2°) dz’) 


Observe that the part proportional to the identity matrix has cancelled. The 
result of inserting a vector X'O; into dgg7! is therefore an element of the 
Lie algebra of SU(2). This is what we mean when we say that dgg7! is 
Lie-algebra-valued. 

For a general group, we define the (right invariant) Maurer-Cartan forms 
w', as being the coeffecient of the Lie algebra generator iri. Thus, for SU(2), 
we have 

dgg' =wr = (iG;)wir. (15.46) 


+HiG, ((—a* + (aa) /2°)da’ + (2° + (a7)? /2?) da? + (2? + (472°) /2°)da*) 
+iG3 ((x? + (x x")/x°)da* + (—a2" + (xx?) /2°)dx? + (x° + (2°)? /2°)da*) . 


(15.45) 
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If we evaluate one-form wp on the right invariant vector field R,, we find 
wr(Ry) = (2° +(a')?/x°)x° + (a? + (xtx?)/x°)a? + (—a? + (xtx*)/x°)(—2?) 
= ee (a*)? (x?)? ee 


= 1. (15.47) 


Working similarly, we find 


wp(Re) = (2° + (a)? /2°)(—a?) + (2? + (2'a*)/2°)2° + (—2? + (2' 2°) /2°)2' 
££). (15.48) 


In general, we discover that w(R;) = 6;. The Maurer-Cartan forms therefore 
constitute the dual basis to the right-invariant vector fields. 
We may similarly define the left-envariant Maurer-Cartan forms by 


g dg = wy = (16;)wy. (15.49) 


These obey w}(L;) = 65, showing that the wi, are the dual basis to the 
left-invariant vector fields. 

Acting with the exterior derivative d on gg~' = I tells us that d(g~') = 
—g~‘dgg~. By exploiting this fact, together with the anti-derivation prop- 
erty 

d(a \ b) = da A b+ (-1)?a db, 


we may compute the exterior derivative of wr. We find that 
dup = d(dgg™*) = (dgg™*) A (dgg™*) =wrAwr. (15.50) 


A matrix product is implicit here. If it were not, the product of the two 
identical 1-forms on the right would automatically be zero. When we make 
this matrix structure explicit, we see that 


WrAWwR = We Awh(id;)(iG;) 
ed Bi 
= 5”R AM wp [to;, 105] 
ee Ce 1561 
= Pedy (10%) Wr A Wr, (15.51) 
SO : 
du = —=f,,"w Awh. (15.52) 
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These equations are known as the Maurer-Cartan relations for the right- 
invariant forms. 
For the left-invariant forms we have 


dwy = d(g~‘dg) = —(g~‘dg) A (g-‘dg) = —w, Awz, (15.53) 
or i 
du} = +54 ‘ot Aur. (15.54) 


The Maurer-Cartan relations appear in the physics literature when we 
quantize gauge theories. They are one part of the BRST transformations of 
the Fadeev-Popov ghost fields. We will provide a further discussion of these 
transformations in the next chapter. 


15.2.3. Euler angles 


In physics it is common to use Euler angles to parameterize the group SU(2). 
We can write an arbitrary SU(2) matrix U as a product 


U = exp{—ido3/2} exp{—i0o, /2} exp{—iwa3/2}, 
e 9/2 9) cos0/2 —sin@/2 evi2 9 
0 e@/? sin@/2 cos 0/2 OG. ‘eve y? 
fet)? cog 6/2 —e¥-9)/? sin 6/2 
i el($-¥)/2 sin 6/2 eX(%+9)/2 cog 9/2} ” 


(15.55) 


Comparing with the earlier expression for U in terms of the co-ordinates x", 
we obtain the Euler-angle parameterization of the three-sphere 


z° =  cos6/2cos(y + ¢)/2, 
x = sinO/2sin(d —)/2, 
xz? = —sin@/2cos(¢— )/2, 
xz? = —cos6/2sin(y + ¢)/2. (15.56) 


When the angles are taken in the range 0 < @ < 27,0<0<7,0<W<4r 
we cover the entire three-sphere exactly once. 


Exercise 15.5: Show that the Hopf map, defined in chapter 3, Hopf : S? > S? 
is the “forgetful” map (6,¢,W) — (6,¢), where 6 and ¢ are spherical polar 
co-ordinates on the two-sphere. 
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Exercise 15.6: Show that 
-1 We 
U~"dU = Sa Lo 
where 
OF = sinw dé — sin @ cos w dd, 
@? = cosydé-+sin@sin wy dd, 
03 = dw-+coséd¢. 
Observe that these 1-forms are essentially the components 
wx = sinwd—sinOcosyd, 
Wy = cos t) 6 + sin @ sin w 4, 
wz = wtcosd¢. 
of the angular velocity w of a body with respect to the body-fired XY Z axes 
in the Euler-angle conventions of exercise 11.17. 
Similarly, show that 


dUU-! = — 53 i 


where 
Ok = —singdd+sinécosy dy, 
OR = cos ¢ dé + sin sin wy dy, 
OR = dd+cosédiy, 


Compute the components wz, wy, wz of the same angular velocity vector w, but 
now taken with respect to the space-fired xyz frame. Compare your answer 
with the Oh. 


15.2.4 Volume and metric 


The manifold of any Lie group has a natural metric, which is obtained by 
transporting the Killing form (see section 15.3.2) from the tangent space at 
the identity to any other point g by either left or right multiplication by 
g. In For a compact group, the resultant left- and right-invariant metrics 
coincide. In the particular case of SU(2) this metric is the usual metric on 
the three-sphere. 
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Using the Euler angle expression for the x“ to compute the dx", we can 
express the metric on the sphere as 


ds? = (dx°)? + (de)? + (dx?)? + (dx)?, 
(d6? + cos?6/2(dy + de)? + sin?6/2(dy — de)’) , 
( 


dO? + dy” + dd? + 2cosb dd dy). (15.57) 


Here, to save space, we have used the traditional physics way of writing a 
metric. In the more formal notation, where we think of the metric as being 
a bilinear function, we would write the last line as 


gl, ) =F (a0 2 dO + dip & did + dd @ do + cos Odd @ dy + dey ® d)). 


(15.58) 
From (15.58) we find 
I 14> 20) 0 
g= det (gw) = B 0 1 cosé 
0 cosé 1 
= I ae 7 in? (15.59) 
= rg cos’@) = -sin’d. : 
The volume element, ,/g d0d¢dv), is therefore 
1 
d(Volume) = 3 sin 6dédddy, (15.60) 
and the total volume of the sphere is 
1 T 20 An 
Vol(S*) = = [ sinoad f ao | dy) = 2r?. (15.61) 
0 0 0 


This volume coincides, for d = 4, with the standard expression for the volume 
of S¢!, the surface of the d-dimensional unit ball, 


Qn4/? 


Vols) r 


(15.62) 
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Exercise 15.7: Evaluate the Maurer-Cartan form wt in terms of the Euler 


angle parameterization, and hence show that 
. 3 ee ee | a 
wy = ait (a3U dU) = ~ 9 (dy + cos 6 d¢). 


Now recall that the Hopf map takes the point on the three-sphere with Euler 
angle co-ordinates (0, ¢, ~) to the point on the two-sphere with spherical polar 
co-ordinates (0, ¢). Thus, if we set A = —dw — cos @d¢, then we find 


F = dA =sin6 dé dé = Hopf* (d [Area $7]) : 


Also observe that 
AA F =-—sin0 dd dd dy. 


From this, show that the Hopf index of the Hopf map itself is equal to 


1 
ANF=-1. 
162 ie 


Exercise 15.8: Show that for U the defining two-by-two matrices of SU(2), we 
have 


tr [((U~*dU)9] = 24n?. 
SU(2) 


Suppose we have a map g : R® — SU(2) such that g(x) goes to the identity 
element at infinity. Consider the integral 


= 1 —1 3 
Stal = sry f(a "9y, 


where the 3-form tr (g~'dg)? is the pull-back to R® of the form tr [((U~!dU)?] 
on SU(2). Show that if we we make the variation g — g + 6g, then 


5S[g] = sit i: d{3tr ((g-"d9)(g-"a9)*) } =, 


and so S{[g] is topological invariant of the map g. Conclude that the functional 
S{g] is an integer, that integer being the Brouwer degree, or winding number, 
of the map g: S° > S°. 


Exercise 15.9: Generalize the result of the previous problem to show, for any 
mapping x +> g(x) into a Lie group G, and for n an odd integer, that the 
n-form tr (g~'dg)" constructed from the Maurer-Cartan form is closed, and 
that 


dtr (g7'dg)” = d{ntr ((g “ég)(g ‘dg)” ae 


(Note that for even n the trace of (g~'dg)” vanishes identically.) 
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15.2.5 §0(3) ~ SU(2)/Zs 


The groups SU(2) and SO(3) are locally isomorphic. They have the same 
Lie algebra, but differ in their global topology. Although rotations in space 
are elements of SO(3), electrons respond to these rotations by transforming 
under the two-dimensional defining representation of SU(2). As we shall see, 
this means that after a rotation through 27 the electron wavefunction comes 
back to minus itself. The resulting orientation entanglement is characteristic 
of the spinor representation of rotations and is intimately connected with 
the Fermi statistics of the electron. The spin representations were discovered 
by Elie Cartan in 1913, some years before they were needed in physics. 

The simplest way to motivate the spin/rotation connection is via the 
Pauli sigma matrices. These matrices are hermitian, traceless, and obey 


If, for any U € SU(2), we define 
C=Uel ™, (15.64) 


then the a are also hermitian, traceless, and obey (15.63). Since the original 
d; form a basis for the space of hermitian traceless matrices, we must have 


a! = 3, Rj: (15.65) 


for some real 3-by-3 matrix having entries R;;. From (15.63) we find that 


alot alo! 


26i5 = 0,0; + O;0; 

= (FRu)(FmRmj) + (FmRmy)(FiRu) 
(Fim + OmO1) Ri Bm; 
2 Rake: 


Thus 


In other words, R?R = I, and so R is an element of O(3). Now the determi- 
nant of any orthogonal matrix is +1, but the manifold of SU(2) is a connected 
set and R = J when U = IJ. Since a continuous map from a connected set to 
the integers must be a constant, we conclude that det R = 1 for all U. The 
R matrices are therefore in SO(3). 
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We now exploit the principle of the sextant to show that the correspon- 
dance goes both ways, i.e. we can find a U(R) for any element R € SO(3). 


Left—hand half of fixed 
mirror is silvered. Right- 
hand half is transparant 


To sun v. 


View through telescope 
of sun brought down to 
touch horizon 


v Telescope 
To Horizon Z 
ip ene dupe acedeieiadsided nec tex ) a f : 
Fixed, half silvered mirror 


Figure 15.1: The sextant. The telescope and the half-silvered mirror are fixed 
to the frame of the instrument, which also holds the scale. The second mirror 
and attached pointer pivot so that the angle 0 between the mirrors can be 
varied and accurately recorded. The scale is calibrated so as to display the 
altitude 20. For the configuration shown, 8 = 15° while the pointer indicates 
that the sun is 30° above the horizon. 


This familiar instrument is used to measure the altitude of the sun above the 
horizon while standing on the unsteady deck of a ship at sea. A theodolite or 
similar device would be rendered useless by the ship’s pitching and rolling. 
The sextant exploits the fact that successive reflection in two mirrors inclined 
at an angle @ to one another serves to rotate the image through an angle 20 
about the line of intersection of the mirror planes. This rotation is used to 
superimpose the image of the sun onto the image of the horizon, where it 
stays even if the instrument is rocked back and forth. Exactly the same trick 
is used in constructing the spinor representations of the rotation group. 
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Consider a vector x with components x‘ and form the matrix ® = 2v'G;,. 
Now, if n is a unit vector with components n’‘, then 


(—Gn')x(G,n") = (x) — 2(n- x)(n’)) G =X -2(n-x)N (15.67) 


The vector x—2(n-x)n is the result of reflecting x in the plane perpendicular 
to n. Consequently 


—(0, cos 6/2 + G2 sin 6/2)(—G1) K (G1) (G, cos 0/2 + G2 sin 0/2) (15.68) 


performs two successive reflections on x. The first, a reflection in the “1” 
plane, is performed by the inner o,’s. The second reflection, in a plane at 
an angle 0/2 to the “1” plane, is performed by the (a; cos 6/2 + G2 sin @/2)’s. 
Multiplying out the factors, and using the o; algebra, we find 


(cos 0/2 — 0102 sin 6/2)x(cos 6/2 + 0102 sin 0/2) 
= 6,(cos x! — sin@ x”) + Go(sin x’ + cos x”) + G3x°. (15.69) 
The effect on x is a rotation 
zw coséx!—sin§ x’, 


x + sin@x!+cos02’, 
sl eae (15.70) 


through the angle 6 about the 3 axis. We can drop the x’ and re-express 
(15.69) as 
UG,U = G;Ry, (15.71) 


where R;; is the 3-by-3 rotation matrix 


cos@ —siné 0 
R= | sind cos? 0], (15.72) 

0 0 1 

and 
U = exp {5a} = exp {igo} (15.73) 

i 


is an element of SU(2). We have exhibited two ways of writing the exponents 
in (15.73) because the subscript 3 on @3 indicates the axis about which we 
are rotating, while the 1, 2 in [71, G2] indicates the plane in which the rotation 
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occurs. It is the second language that generalizes to higher dimensions. More 
on the use of mirrors for creating and combining rotations can be found in 
841.1 of Misner, Thorne, and Wheeler’s Gravitation. 

This mirror construction shows that for any R € SO(3) there is a two- 
dimensional unitary matrix U(R) such that 


U(R)G,U~*(R) = 6 Rji. (15.74) 
This U(R) is not unique, however. If U € SU(2) then so is —U. Furthermore 
U(R)o,U~*(R) = (-U(R))6;(-U(R))~", (15.75) 


and so U(R) and —U(R) implement exactly the same rotation R. Conversely, 
if two SU(2) matrices U, V obey 


Uo,U | =Vo,V" (15.76) 


then V~'U commutes with all 2-by-2 matrices and, by Schur’s lemma, must 
be a multiple of the identity. But if AJ € SU(2) then A = +1. Thus, 
U =+V. The mapping between SU(2) and SO(3) is therefore two-to-one. 
Since U and —U correspond to the same R, the group manifold of SO(3) 
is the three-sphere with antipodal points identified. Unlike the two-sphere, 
where the identification of antipodal points gives the non-orientable projec- 
tive plane, the SO(3) group manifold remains orientable. It is not, however, 
simply connected: a path on the three-sphere from a point to its antipode 
forms a closed loop in SO(3), but one that is not contractable to a point. If 
we continue on from the antipode back to the original point, the complete 
path 7s contractable. This means that the first Homotopy group, the group 
m™1(SO(3)) of based paths in SO(3) with composition given by concatenation, 
is isomorphic to Za. This two-element group encodes the topology behind the 
Balinese Candle Dance, and keeps track of whether a sequence of rotations 
that eventually bring a spin-5 particle back to its original orientation should 
be counted as a 360° rotation (U = —I) or a 720° ~ 0° rotation (U = +I). 


Exercise 15.10: Verify that 
U(R)GiU 1 (R) = G Rj 


is consistent with U(R2)U(R,) = +U(R2R). 
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Spinor representations of SO(NV) 


The mirror trick can be extended to perform rotations in N dimensions. We 
replace the three g; matrices by a set of N Dirac gamma matrices, which 
obey the defining relations of a Clifford algebra: 


WA +A = Wud (15.77) 


These relations are a generalization of the key algebraic property of the Pauli 
sigma matrices. 

If N(= 2n) is even, then we can find 2”-by-2” hermitian matrices 7, 
satisfying this algebra. If N (= 2n+ 1) is odd, we append to the matrices for 
N = 2n the hermitian matrix Jon41 = —(4)"9142- ++ Jen which obeys 43,41 = 
1 and anti-commutes with all the other ¥,,. The 7 matrices therefore act on 
a 2'/2! dimensional space, where the symbol | N/2]| denotes the integer part 
of N/2. 

The ¥’s do not form a Lie algebra as they stand, but a rotation through 
@ in the yv-plane is obtained from 

JAetie, sad ss Slee asks Ze 
exp {igh te} 4; exp {ig tine} = (15.78) 
and we find that the hermitian matrices (kee = £/7,,7] form a basis for the 
Lie algebra of SO(N). The 2!%/7! dimensional space on which they act is the 
Dirac spinor representation of SO(.V). Although the matrices exp{il Ou} 
are unitary, they are not the entirety of U(2/?!), but instead constitute a 
subgroup called Spin(/V). 

If N is even then we can still construct the matrix Yo, that anti- 
commutes with all the other ¥,,’s. It cannot be the identity matrix, therefore, 
but it commutes with all the [,,,. By Schur’s lemma, this means that the 
SO(2n) Dirac spinor representation space V is reducible. Now, V5n4, = J, 
and so 4an+1 has eigenvalues +1. The two eigenspaces are invariant under 
the action of the group, and thus the Dirac spinor space decomposes into two 
irreducible Weyl spinor representations: 


V= Vodd ®D Vewens (15.79) 


Here Veyen and Voaa, the plus and minus eigenspaces of 42,41, are called the 
spaces of right and left chirality. When N is odd the spinor representation 
is irreducible. 
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Exercise 15.11: Starting from the defining relations of the Clifford algebra 
(15.77) show that, for N = 2n, 


try) = 0, 
tr Fanti) = 9, 
tr) = tr) dw, 
tr (WI) = 9, 
tr Ww WIoIr) = tr (LD) Oyb0r — Spo dvr + Sur 5ve)- 


Exercise 15.12: Consider the space 0(C) = ,, ?(C) of complex-valued skew 
symmetric tensors A iygcas for 0 <p< N = 2n. Let 


N 


Le i 
Yop = a aT Fu ne Twp) op Auy...up 
p=o P 


define a mapping from 2(C) into the space of complex matrices of the same 
size as the ¥,,. Show that this mapping is invertible — i.e. given Wag we can 
recover the A,,....,- By showing that the dimension of Q(C) is 2 deduce 
that the 7,, must be at least 2”-by-2” matrices. 


Exercise 15.13: Show that the R?” Dirac operator D = YO. obeys DA=V2. 
Recall that Hodge operator d — 6 from section 13.7.1 is also a “square root” 
of the Laplacian: 

(G33) ']=-G6 Hide ve 


Show that 
Wap mae (Dy)ag — (Fae! Vere 


corresponds to the action of d—6 on the space 0(R?", C) of differential forms 


1 
A= pis (eda «+. dal. 
The space of complex-valued differential forms has thus been made to look like 
a collection of 2” Dirac spinor fields, one for each value of the “flavour index” 
B. These Wag are called Kahler-Dirac fields. They are not really flavoured 


spinors because a rotation transforms both the a and ( indices. 


Exercise 15.14: That a set of 2n Dirac y’s have a 2”-by-2” matrix represen- 
tation is most naturally established by using the tools of second quantization. 
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To this end, let a;, al i=1,...,n be set of anti-commuting annihilation and 
creation operators obeying 

Aja; + AjAj = 0, aja + aka; = dist, 
and let |0) be the “no particle” state for which a,|0) = 0,7 =1,...,n. Then 
the 2” states 

Passes tig) Saye a(Gh (0); 
where the m; take the value 0 or 1, constitute a basis for a space on which the 
a; and al act irreducibly. Show that the 2n operators 


% = at al, 
Yitn = i(a;—al), 
obey 
Wy + Wp = 20, 


and hence can be represented by 2”-by-2” matrices. Deduce further that spaces 
of left and right chirality are the spaces of odd or even “particle number.” 


The adjoint representation 


We established the connection between SU(2) and SO(3) by means of a conju- 
gation: 6; ~ UG,U—'. The idea of obtaining a representation by conjugation 
works for an arbitrary Lie group. It is easiest, however, to describe in the case 
of a matrix group where we can_consider an infinitesimal element J + i¢,,. 
The conjugate element g(I + 7%e\;)g~+ will also be an infinitesimal element. 
Since gIg~' = I, this means that glidi)g7! must be expressible as a linear 
combination of the ir matrices. Consequently, we can define a linear map 
acting on the element X = €'A,; of the Lie algebra by setting 


Ad(g)\i = Aig”! = Aj[Ad (9)]’;. (15.80) 


The matrices with entries [Ad (g)|’, form the adjoint representation of the 
group. The dimension of the adjoint representation coincides with that of 
the group manifold. The spinor construction shows that the defining repre- 
sentation of SO(3) is the adjoint representation of SU(2). 

For a general Lie group, we make Ad(g) act on a vector in the tangent 
space at the identity by pushing the vector forward to TG, by left multiplica- 
tion by g, and then pushing it back from TG, to TG, by right multiplication 


by git. 
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Exercise 15.15: Show that 


[Ad (9192); = [Ad (91), [Ad (92)1";, 


thus confirming that Ad(g) is a representation. 


15.2.6 Peter-Weyl theorem 


The volume element constructed in section 15.2.4 has the feature that it is 
invariant. In other words if we have a subset 2 of the group manifold with 
volume V, then the image set gQ under left multiplication has the exactly the 
same volume. We can also construct a volume element that is invariant under 
right multiplication by g, and in general these will be different. For a group 
whose manifold is a compact set, however, both left- and right-invariant 
volume elements coincide. The resulting measure on the group manifold is 
called the Haar measure. 

For a compact group, therefore, we can replace the sums over the group 
elements that occur in the representation theory of finite groups, by con- 
vergent integrals over the group elements using the invariant Haar measure, 
which is usually denoted by d[g]. The invariance property is expressed by 
d\gig| = d[g| for any constant element g,. This allows us to make a change- 
of-variables transformation, g — gig, identical to that which played such an 
important role in deriving the finite-group theorems. Consequently, all the 
results from finite groups, such as the existence of an invariant inner product 
and the orthogonality theorems, can be taken over by the simple replacement 
of a sum by an integral. In particular, if we normalize the measure so that 
the volume of the group manifold is unity, we have the orthogonality relation 


[ew (Di(9))° Din(9) = i bm (15.81) 


The Peter-Weyl theorem asserts that the representation matrices DZ, (g) 
form a complete set of orthogonal functions on the group manifold. In the 
case of SU(2) this tells us that the spin J representation matrices 


Die, 0,0) = Ws, le: erate ey | f, 7), 
ed! (O\e (15.82) 
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which you will likely have seen in quantum mechanics courses,! are a complete 
set of functions on the three-sphere with orthogonality relation 


1 wT 27 Ar ‘ : 7 

mf sinodd | ao f deb (De n(9,0,0)) Doin (9, dV) 
1 

Opa 


OG aod (15.83) 


Since the D£, (where L has to be an integer for n = 0 to be possible) are 
independent of the third Euler angle, w, we can do the trivial integral over 
w to obtain the special case 


1 


es [oe Smt (15.84) 


1 T 20 _ ‘ 
zi sinoad | do (Di (0, ¢)) Diio(8, ) 


Comparing with the definition of the spherical harmonics, we see that we can 


identify 
¥E(0,6) = f= (Dho(8,6,0))". (15.85) 


—imd 
9 


The complex conjugation is necessary here because DZ, (0,6,w) x e 
while Y,4(6, 6) « e”?. 

The character, y7(g) = ><, Dz,,(g) will be a function only of the rotation 
angle 6 and not the axis of rotation — all rotations through a common angle 
being conjugate to one another. Because of this, y7(@) can be found most 
simply by looking at rotations about the z axis, since these give rise to easily 
computed diagonal matrices. Thus, we find 


x(6) = ef8 4 eID 4... 4 et“ 4 CHO, 
sin(2J + 1)0/2 
~  sino/2 (15.86) 


Warning: The angle @ in this formula and the next is not the the Euler 
angle. 
For integer J, corresponding to non-spinor rotations, a rotation through 


an angle @ about an axis n and a rotation though an angle 27 — @ about —n 
are the same operation. The maximum rotation angle is therefore 7. For 


See, for example, G. Baym Lectures on Quantum Mechanics, Ch 17. 
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spinor rotations this equivalence does not hold, and the rotation angle # runs 
from 0 to 27. The character orthogonality must therefore be 


= : ' x7 (0)x7" (8) sin?(0/2)d0 = 577", (15.87) 


T 
implying that the volume fraction of the rotation group containing rotations 
through angles between 6 and @ + dé is sin?(0/2)d0/z. 


Exercise 15.16: Prove this last statement about the volume of the equivalence 
classes by showing that the volume of the unit three-sphere that lies between 
a rotation angle of @ and 6+ d@ is 27sin?(0/2)d0. 


15.2.7 Lie brackets vs. commutators 


There is an irritating minus-sign problem that needs to be acknowledged. 
The Lie bracket [X,Y] of two vector fields is defined by first running along 
X, then Y and then back in the reverse order. If we do this for the action 
of matrices, X and Y, on a vector space, then, since the sequence of matrix 
operations is to be read from right to left, we have 


prey g-tX oY ox 7 el, Y] sees (15.88) 


which has the other sign. Consider, for example, rotations about the z, y, z 
axes, and look at effect these have on the co-ordinates of a point: 


oy = 200 ~ i ° 
Luce { = hb = 0. - 20), Lz=|0 0 -1), 
dz = +y06, 0 1 0 
G2 1g 
6z = —2x060 > 
5 oe { _ "| + by = 20, - 00, Las Oe Oe SOc itt, 
dz = +266, gs 
Coat 0 
pe. ee On — L,=20,-y:, L.=[{1 0 0 
doy = +200, 0 0 0 


From this we find 
bes Ly] ==> = ibs (15.89) 
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as a Lie bracket of vector fields, but 
es L, = ios (15.90) 


as a commutator of matrices. This is the reason why it is the /eft-invariant 
vector fields whose Lie bracket coincides with the commutator of the 7); 
matrices. 

Some insight into all this can be had by considering the action of the 
left-invariant fields on the representation matrices, D’,,(g). For example, 


LDing(a) = tim |® (Dja(g(1 + 16)) - Din(0)) 


e-0/€ 
= lim * (0; (g)D?,,(1 + ie) — D? (9)) 
<0 | € mn nin a mn 
25 Tian AC ,) 'e(A2) nn) — D2 
7 0 Dian (9) wa tel ae pho) 
E> € 
= Do w(g)GAl an (15.91) 


where AJ is the matrix representing oF in the representation J. Repeating 
this exercise we find that 


Di (L;D7,,,(9)) > Dai (9) (iAZ) pn! (A?) ams (15.92) 


Thus au). wk 

[Lis Lj] Dinn(9) = Dinn (9)EAY tA5 |v; (15.93) 
and we get the commutator of the representation matrices in the “correct” 
order only if we multiply the infinitesimal elements in successively from the 
right. 

There appears to be no escape from this sign problem. Many texts simply 
ignore it, a few define the Lie bracket of vector fields with the opposite sign, 
and a few simply point out the inconvenience and get on the with the job. 
We will follow the last route. 


15.3 Lie algebras 


A Lie algebra g is a (real or complex) finite-dimensional vector space with a 
non-associative binary operation g x g — g that assigns to each ordered pair 
of elements, X,, X2, a third element called the Lie bracket, [X,, X2]. The 
bracket is: 
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a) Skew symmetric: [X,Y] = —[Y, X], 

b) Linear: [AX + pY, Z] = ALX, Z] + wlY, Z], 
and in place of associativity, obeys 

c) The Jacobi identity: [[X,Y], Z] + [[Y, Z], X]+ [[Z, X], Y] =0. 
Example: Let M(n) denote the algebra of real n-by-n matrices. As a vector 
space over R, this algebra is n* dimensional. Setting [A,B] = AB — BA, 
makes M(n) into a Lie algebra. 
Example: Let 6+ denote the subset of M(n) consisting of upper triangular 
matrices with any number (including zero) allowed on the diagonal. Then 
b* with the above bracket is a Lie algebra. (The “b” stands for the French 
mathematician and statesman Emile Borel). 
Example: Let nt denote the subset of b* consisting of strictly upper trian- 
gular matrices — those with zero on the diagonal. Then n* with the above 
bracket is a Lie algebra. (The “n” stands for nilpotent.) 
Example: Let G be a Lie group, and L; the left invariant vector fields. We 
know that 


[Li Ly] = fig" La (15.94) 


where | , | is the Lie bracket of vector fields. The resulting Lie algebra, 
g = LieG is the Lie algebra of the group. 

Example: The set N* of upper triangular matrices with 1’s on the diagonal 
forms a Lie group and has n* as its Lie algebra. Similarly, the set B* 
consisting of upper triangular matrices, with any non-zero number allowed 
on the diagonal, is also a Lie group, and has 6* as its Lie algebra. 


Ideals and quotient algebras 


As we saw in the examples, we can define subalgebras of a Lie algebra. If 
we want to define quotient algebras by analogy to quotient groups, we need 
a concept analogous to that of invariant subgroups. This is provided by the 
notion of an ideal. A ideal is a subalgebra i C g with the property that 


i, g] Ci. (15.95) 


In other words, taking the bracket of any element of g with any element of i 
gives an element in i. With this definition we can form g — i by identifying 
X ~ X44] for any J €i. Then 


[X +i,¥ +i] = [X,Y] +i, (15.96) 
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and the bracket of two equivalence classes is insensitive to the choice of 
representatives. 

If a Lie group G has an invariant subgroup H that is also a Lie group, 
then the Lie algebra h of the subgroup is an ideal in g = LieG, and the Lie 
algebra of the quotient group G/H is the quotient algebra g — h. 

If the Lie algebra has no non-trivial ideals, then it is said to be s¢mple. 
The Lie algebra of a simple Lie group will be simple. 


Exercise 15.17: Let i, and ig be ideals in g. Show that i, Mig is also an ideal 


in g. 


15.3.1 Adjoint representation 


Given an element X € g, let it act on the Lie algebra, considered as a vector 
space, by a linear map ad (x) defined by 


ad (X)Y = [X,Y]. (15.97) 


The Jacobi identity is then equivalent to the statement: 


(ad (X Jad (Y) — ad (Y )ad (X)) Z = ad ([X, Y])Z. (15.98) 

Thus 
(ad (X )ad (Y) — ad (Y Jad (X)) = ad ([X, Y]), (15.99) 
. [ad (X), ad (Y)] = ad ([X, Y]), (15.100) 


and the map X — ad (X) is a representation of the algebra called the adjoint 
representation. 

The linear map “ad (X)” exponentiates to give a map exp|ad (tX )| defined 
by 


explad ((X)]Y¥Y =Y+7[X,Y]+ SPX, [X,Y]]+---. (15.101) 


You probably know the matrix identity? 


1 
e4Be 4 = B+1[A, BJ + 5t 1A, [A, B]] +++. (15.102) 
2In case you do not, it is easily proved by setting F(t) = e’4Be-*4, noting that 
4 F(t) = [A,F(t)], and observing that the RHS is the unique series solution to this 
equation satisfying the boundary condition F'(0) = B. 
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Now, earlier in the chapter, we defined the adjoint representation “Ad” of 
the group on the vector space of the Lie algebra. We did this setting gXg~' = 
Ad (g)X. Comparing the two previous equations we see that 


Ad (Exp Y) = exp(ad (Y)). (15.103) 


15.3.2. The Killing form 


Using “ad” we can define an inner product ( , ) ona real Lie algebra by 
setting 
(X,Y) = tr (ad (X Jad (Y)). (15.104) 


This inner product is called the Killing form, after Wilhelm Killing. Using 
the Jacobi identity and the cyclic property of the trace, we find that 


(ad (X)Y,Z) + (Y,ad(X)Z) =0, (15.105) 


or, equivalently, 
(Xx, Y], Z) + (Y, [X, Z]) =0. (15.106) 


From this we deduce (by differentiating with respect to t) that 
(exp(ad (tX))Y, exp(ad (tX ))Z) = (Y, Z), (15.107) 


so the Killing form is invariant under the action of the adjoint representation 
of the group on the algebra. When our group is simple, any other invariant 
inner product will be proportional to this Killing-form product. 


Exercise 15.18: Let i be an ideal in g. Show that for [,, Io €i 


(hi, ta)g = (hi, hy), 


where ( , ); is the Killing form on i considered as a Lie algebra in its own 
right. (This equality of inner products is not true for subalgebras that are not 
ideals. ) 


Semisimplicity 


Recall that a Lie algebra containing no non-trivial ideals is said to be szm- 
ple. When the Killing form is non degenerate, the Lie algebra is said to be 
semisimple. The reason for this name is that a semisimple algebra is almost 
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simple, in that it can be decomposed into a direct sum of decoupled simple 
algebras: 
G = 5, P50 0---OSy. (15.108) 


By “decoupled” we mean that the direct sum symbol “@” implies not only 


a direct sum of vector spaces but also that [s;,5;] = 0 for i F 7. 

The Lie algebra of all the matrix groups O(n), Sp(n), SU(n), etc. are 
semisimple (indeed they are usually simple) but this is not true of the algebras 
nt and bt, 

Cartan showed that our Killing-form definition of semisimplicity is equiv- 
alent his original definition of a Lie algebra being semisimple if the algebra 
contains no non-zero abelian ideal — i.e. no ideal with [J;,J;] = 0 for all 
I; € i. The following exercises establish the direct sum decomposition, and, 
en passant, the easy half of Cartan’s result. 


Exercise 15.19: Use the identity (15.106) to show that if i C g is an ideal, then 
it, the set of elements orthogonal to i with respect to the Killing form, is also 
an ideal. 


Exercise 15.20: Show that if a is an abelian ideal, then every element of 
a is Killing perpendicular to the entire Lie algebra. (Thus, non-degeneracy 
implies no non-trivial abelian ideal. The null space of the Killing form is not 
necessarily an abelian ideal, though, so establishing the converse is harder.) 


Exercise 15.21: Let g be a semisimple Lie algebra and i C g an ideal. We 
know from exercise 15.17 that init is an ideal. Use (15.106), coupled with 
the non-degeneracy of the Killing form, to show that it is an abelian ideal. 
Use the previous exercise to conclude that init = {0}, and from this that 
44) =0. 


Exercise 15.22: Let ( , ) be a non-degenerate inner product on a vector space 
V. Let W CV bea subspace. Show that 


dim W + dimW*+ = dimV. 


(This is not as obvious as it looks. For a non-positive-definite inner product, 
W and W+ can have a non-trivial intersection. Consider two-dimensional 
Minkowski space. If W is the space of right-going, light-like, vectors then 
W=W4, but dimW + dimW + still equals two.) 
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Exercise 15.23: Put the two preceding exercises together to show that 
g=10 ta 


Show that i and it are semisimple in their own right as Lie algebras. We can 
therefore continue to break up i and i+ until we end with g decomposed into 
a direct sum of simple algebras. 


Compactness 


If the Killing form is negative definite, a real Lie algebra is said to be compact, 
and is the Lie algebra of a compact group. With the physicist’s habit of 
writing 7X; for the generators of the Lie algebra, a compact group has Killing 
metric tensor 


i; tr {ad (X;)ad (X;)} (15.109) 


that is a positive ee matrix. In a basis where g;; = 6;;, the exp(ad_X) 
matrices of the adjoint representations of a compact group G form a subgroup 
of the orthogonal group O(.V), where N is the dimension of G. 


Totally anti-symmetric structure constants 


Given a basis 7X; for the Lie-algebra vector space, we define the structure 
constants f;;" through 
[X;, X5] = ifis* Xe. (15.110) 


In terms of the fis , the skew symmetry of ad (X;), as expressed by equation 
(15.105), becomes 


0 = (ad(X,)Xi,Xj) + (Xi, ad (Xy)X;j) 
( 


= [XE, X. pe. e a) + (X. a) [Xk X. i)) 
= i(fri’ uj + Ging ) 
= Ae pee a Fics) (15.111) 


In the last line we have used the Killing metric to “lower” the index / and so 
define the symbol fijx. Thus, fi;, is skew symmetric under the interchange 
of its second pair of indices. Since the skew symmetry of the Lie bracket 
ensures that fj;, is skew symmetric under the interchange of the first pair of 
indices, it follows that fj; is skew symmetric under the interchange of any 
pair of its indices. 
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By comparing the definition of the structure constants with 


[Xi, Xj] = ad (Xi) X; = Xplad (Xi)]*,, (15.112) 
we read-off that the matrix representing ad (X;) has entries 
[(ad (X;)]"; = tfey*. (15.113) 
Consequently 
gig = tr {ad (X;)ad (X;)} = — fiw’ fr”. (15.114) 


The quadratic Casimir 


The only “product” that is defined in the abstract Lie algebra g is the Lie 
bracket [X,Y]. Once we have found matrices forming a representation of 
the Lie algebra, however, we can form the ordinary matrix product of these. 
Suppose that we have a Lie algebra g with basis X;, and have found matrices 
X; with the same commutation relations as the X;. Suppose, further, that 
the algebra is semisimple and so g’’, the inverse of the Killing metric, exists. 
We can use g” to construct the matrix 


Cy = gi X,X;. (15.115) 


This matrix is called the quadratic Casimir operator, after Hendrik Casimir. 
Its chief property is that it commutes with all the X;: 


[C2, Xi] = 0. (15.116) 
If our representation is irreducible then Shur’s lemma tells us that 
Cy ZL (15.117) 


where the number C2 is referred to as the “value” of the quadratic Casimir 
in that irrep.? 


Exercise 15.24: Show that [Co, Xj] = 0 is another consequence of the complete 
skew symmetry of the fj;x. 


3Mathematicians do sometimes consider formal products of Lie algebra elements X,Y € 
g. When they do, they equip them with the rule that XY — YX — [X,Y] = 0, where XY 
and YX are formal products, and [X, Y] is the Lie algebra product. These formal products 
are not elements of the Lie algebra, but instead live in an extended mathematical structure 
called the Universal enveloping algebra of g, and denoted by U(g). The quadratic Casimir 
can then be considered to be an element of this larger algebra. 
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15.3.3. Roots and weights 


We now want to study the representation theory of Lie groups. It is, in fact, 
easier to study the representations of the corresponding Lie algebra and then 
exponentiate these to find the representations of the group. In other words, 
given an abstract Lie algebra with bracket 


[Xe, Xj] Sa fag Xe, (15.118) 
we seek to find all matrices x such that 
[X?, XJ] = ify XZ. (15.119) 


(Here, as with the representations of finite groups, we use the superscript J to 
distinguish one representation from another.) Then, given a representation 
X? of the Lie algebra, the matrices 


D’(g(€)) = exp {ie 7} (15.120) 


where g(€) = Exp {7€'X;}, will form a representation of the Lie group. To be 
more precise, they will form a representation of the part of the group that 
is connected to the identity element. The numbers €° serve as co-ordinates 
for some neighbourhood of the identity. For compact groups there will be 
a restriction on the range of the &', because there must be €' for which 
exp{i€’X7} = I. 


SU(2) 


The quantum-mechanical angular momentum algebra consists of the com- 
mutation relation 


[Ji, J9| = ihJs, (15.121) 


together with two similar equations related by cyclic permutations. This, 
once we set A = 1, is the Lie algebra su(2) of the group SU(2). The goal 
of representation theory is to find all possible sets of matrices that have the 
same commutation relations as these operators. Since the group SU(2) is 
compact, we can use the group-averaging trick from section 14.2.2 to define 
an inner product with respect to which these representations are unitary, and 
the matrices J; hermitian. 
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Remember how this problem is solved in quantum mechanics courses, 
where we find a representation for each spin 7 = 5, Te 3. etc. We begin by 


constructing “ladder” operators 


I I PG, . J I Tay (15.122) 
which are eigenvectors of ad (.J3) 
ic SS (15.123) 


From (15.123) we see that if |j,m) is an eigenstate of J3 with eigenvalue m, 
then J|j,m) is an eigenstate of J3 with eigenvalue m + 1. 

Now, in any finite-dimensional representation there must be a highest 
weight state, |j,7), such that J3|j,7) = j|j,7) for some real number j, and 
such that J+|j,7) = 0. From |j,7) we work down by successive applications 
of J_ to find |j,7 — 1), |j,7 — 2)... We can find the normalization factors of 
the states |j,m) «x (J_)~™|j, 7) by repeated use of the identities 


JJ_ = (Jp + Jy + Jz) — (Js — Js), 
JJ, = (J?+ 4+ J3)— (32 + Jp). (15.124) 


The combination J? = J? + J}? + J? is the quadratic Casimir of su(2), and 
hence in any irrep is proportional to the identity matrix: J? = coI. Because 


O = [F419 a)II? 
= Gg ed cl ta9) 

(9, |J-I+|3, 9) 

JI (J? — J3(J3 + 1)) 359) 

= [eo -—§G +1), 519,9), (15.125) 


and (7, 7|7,7) = |I|7,7) |? is not zero, we must have cy = j(j + 1). 


We now compute 
713m)? = Gm JL I_|j,m) 
= (j,m|J4J_|j,m) 
(j,m| (Fr — J3(J3 — 1)) 7, m) 
iG +1) —m(m—1)\G,mlj,m), (15.126) 
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and deduce that the resulting set of normalized states |j,m) can be chosen 
to obey 


J3|j, m) = m|j,m), 
J_|j,m) = VIG+1)—m(m—1)|j,m— 1), 
Jim) = JSjG+l)—m(m4+1)|j,m +1). (15.127) 


If we take j to be an integer or a half-integer, we will find that J_|j, —j) = 0. 
In this case we are able to construct a total of 27 + 1 states, one for each 
integer-spaced m in the range —7 < m < 7. If we select some other fractional 
value for j, then the set of states will not terminate, and we will find an 
infinity of states with m < —j. These will have ||J_|j,m)||? < 0, so the 
resultant representation cannot be unitary. 


SU(3) 


The strategy of finding ladder operators works for any semisimple Lie algebra. 
Consider, for example, su(3) = Lie(SU(3)). The matrix Lie algebra su(3) is 
spanned by the Gell-Mann A-matrices 


: Oi Ae 10 : 0 -i 0 7 > 00-0 
M={11 00], %» = |i 00], 2={0 -1 Of, 
0 0 0 0 0 0 0 00 
2 001 : O°, 24 ; 0 0 0 
M=100 0], 4 = [0 0 oO], A2+*={0 01], 
ieee tone i 0 0 0 1 0 
7 00 O . , [1 0 0 
Ww={0 0 ~~], Xs = ~—l01 of, (15.128) 
0 i O V3 \o 9 2 


which form a basis for the real vector space of 3-by-3 traceless, hermitian 
matrices. They have been chosen and normalized so that 


tr (AiAj) = 26y;, (15.129) 


by analogy with the properties of the Pauli matrices. Notice that Xe and ay 
commute with each other, and that this will be true in any representation. 
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The matrices 


) 
) 


i 5h eo) 

vs = sO + ids), 

Us = a + irr). (15.130) 
have unit entries, rather like the step-up and step-down matrices ¢4 = $(@,+ 


102). 
Let us define A; to be abstract operators with the same commutation 
relations as \;, and define 


1 
Ty = y(n + ig), 
1 
Vi = 5(Aat#As), 
1 
Us = 5(Ac+iAr), (15.131) 


These are simultaneous eigenvectors of the commuting pair of operators 
ad (A3) and ad (Ag): 


ad(A3)Tz = [A3, Tx] = +2T., 

ad(A3)Vi = [A3, V4] = V3, 

ad (A3)Us = [A3, Us] = FU4, 

ad (Ag) Ty = Ag, Ts =v, 

ad(Ag)Vz = [As, Vi] =+V3Vi, 

ad(Ag)Us = [Ag, Us] = +V3U4. (15.132) 


Thus, in any representation, the 7+, Uz, V+, act as ladder operators, chang- 
ing the simultaneous eigenvalues of the commuting pair A3, Ag. Their eigen- 
values, A3, Ag, are called the weights, and there will be a set of such weights 
for each possible representation. By using the ladder operators one can go 
from any weight in a representation to any other, but one cannot get outside 
this set. The amount by which the ladder operators change the weights are 
called the roots or root vectors, and the root diagram characterizes the Lie 
algebra. 
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=/3 


Figure 15.2: The root vectors of su(3). 


In a finite-dimensional representation there must be a highest-weight state 
|\3, As) that is killed by all three of U,, T, and V,. We can then obtain 
all other states in the representation by repeatedly acting on the highest- 
weight state with U_, T_ or V_ and their products. Since there is usually 
more than one route by which we can step down from the highest weight to 
another weight, the weight spaces may be degenerate — i.e. there may be 
more than one linearly independent state with the same eigenvalues of A3 
and Ag. Exactly what states are obtained, and with what multiplicity, is not 
immediately obvious. We will therefore restrict ourselves to describing the 
outcome of this procedure without giving proofs. 


What we find is that the weights in a finite-dimensional representation of 
su(3) form a hexagonally symmetric “crystal” lying on a triangular lattice, 
and the representations may be labelled by pairs of integers (zero allowed) 
p,q which give the length of the sides of the crystal. These representations 
have dimension d = 3(p + 1)(q+ 1)(p+ q+ 2). 
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| | > 
A 3.2 =f @ 2 2 @ a te 


Figure 15.3: The weight diagram of the 24 dimensional irrep with p = 3, 
q=1. The highest weight is shaded. 


Figure 15.3 shows the set of weights occurring in the representation of SU(3) 
with p = 3 and q = 1. Each circle represents a state, whose weight (A3, As) 
may be read off from the displayed axes. A double circle indicates that there 
are two linearly independent vectors with the same weight. A count confirms 
that the number of independent weights, and hence the dimension of the 
representation, is 24. For SU(3) representations the degeneracy — i.e. the 
number of states with a given weight — increases by unity at each “layer” 
until we reach a triangular inner core, all of whose weights have the same 
degeneracy. 


In particle physics applications, representations are often labelled by their 
dimension. The defining representation of SU(3) and its complex conjugate 
are denoted by 3 and 3, 
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Figure 15.4: The weight diagrams of the irreps with p = 1, q = 0, and p = 0, 
q = 1, also known, respectively, as the 3 and the 3. 


while the weight diagrams of the adjoint represention, 8, and the 10 have 
shape shown in figure 15.5. 


ad 


Figure 15.5: The irreps 8 (the adjoint) and 10. 


Cartan algebras: roots and co-roots 


For a general simple Lie algebra we may play the same game. We first find 
a maximal linearly-independent set of commuting generators h;. These h; 
form a basis for the Cartan algebra h, whose dimension is the rank of the 
Lie algbera. We next find ladder operators by diagonalizing the “ad” action 
of the h; on the rest of the algebra: 


adi (hea = (hz 6a| — Open: (15.133) 


The simultaneous eigenvectors e€, are the ladder operators that change the 
eigenvalues of the h;. The corresponding eigenvalues a, thought of as vectors 
with components a;, are the roots, or root vectors. The roots are therefore 
the weights of the adjoint representation. It is possible to put factors of “7” 
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in appropriate places so that the a; are real, and we will assume that this 
has been done. For example, in su(3) we have already seen that a7 = (2,0), 
Ay = (1, V3), Qu = (-1, V3). 


Here are the basic properties and ideas that emerge from this process: 


i) 
ii) 


iii) 


iv) 


vi) 


Since aj;(€q,hj) = (ad (hi)ea,h;) = —(€a, [hi hj]) = 0, we see that 
(hi, €q) = 0. 

Similarly, we see that (a; + (;)(€a,es) = 0, so the e, are orthogonal to 
one another unless a+ 3 = 0. Since our Lie algebra is semisimple, and 
consequently the Killing form non-degenerate, we deduce that if a is a 
root, so is —a@. 

Since the Killing form is non-degenerate, yet the h; are orthogonal to 
all the eg, it must also be non-degenerate when restricted to the Cartan 
algebra. Thus, the metric tensor, g;; = (hi, h;), must be invertible with 
inverse g'’. We will use the notation a - ( to represent a;3;g". 

If a, @ are roots, then the Jacobi identity shows that 


[hi, lea, egl| — (a; ot Bi)lea, eg}, 


so if [eq, eg] is non-zero then a + { is also a root, and |eq, eg] « Ca+p- 

It follows from iv) that [e.,e—_] commutes with all the h;, and since 5 
was assumed maximal, it must either be zero or a linear combination 
of the h;. A short calculation shows that 


(ha; lea, €—al) = @i(Ca, Ca), 


and, since (€,,€-.) does not vanish, [eq,e—a| is non-zero. We can 
therefore choose to normalize the e+, so that 


def 


2 
= —h; = ha, 


ot 
leas eg = 2 h; 


where a’ = ga;, and ha obeys 


Poenie cal cee La 


The hg, are called the co-roots. 

The importance of the co-roots stems from the observation that the 
triad ha, C+. obey the same commutation relations as ¢3 and ox, and 
so form an su(2) subalgebra of g. In particular h, (being the analogue 
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of 23) has only integer eigenvalues. For example, in su(3) 


asad =| = hr = Az, 
1 3 
[V.,V_] = hy= ahs a rn 
1 3 
ae ctie| = ig = —5As + 2% 


and in the defining representation 


ti (GeQ 
hr = 0 -1 0 
0 O 0 
1 0 0 
hy = 0 0 O 
0 0 -1 
0 0 O 
—— O01 Of], 
0 0 -1 
have eigenvalues +1. 
vii) Since 
2a: 
ad (ha)es = |ha, es] = —a ee: 


we conclude that 2a -3/a? must be an integer for any pair of roots a, 
B. 

viii) Finally, there can only be one e, for each root a. If not, and there 
were an independent e/,, we could take linear combinations so that e_a 
and e’, are Killing orthogonal, and hence [e_,, e,] = a'hi(e_a, ¢,) = 0. 
Thus ad (e_q)e, = 0, and ef, is killed by the step-down operator. It 
would therefore be the lowest weight in some su(2) representation. At 
the same time, however, ad (h,)e/, = 2e/,, and we know that the lowest 
weight in any spin J representation cannot have positive eigenvalue. 


The conditions that 
2a- 


a2 


EZ 


for any pair of roots tightly constrains the possible root systems, and is the 
key to Cartan and Killing’s classification of the semisimple Lie algebras. For 
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example the angle @ between any pair of roots obeys cos? 6 = n/4 so @ can 
take only the values 0°, 30°, 45°, 60°, 90°, 120°, 135°, 150°, or 180°. 

These constraints lead to a complete classification of possible root systems 
into the following infinite families: 


An, n=1,2,---. sl(n+1,C), 
Bae Wa Q.B0 + s0(2n+1,C), 
Cee WS BAe ss sp(2n,C), 
Dy, WHA SA so0(2n,C), 


together with the root systems G2, Fy, Ee, E7, and Eg of the exceptional 
algebras. The latter do not correspond to any of the classical matrix groups. 
For example, G2 is the root system of go, the Lie algebra of the group Gz of 
automorphisms of the octonions. This group is also the subgroup of SL(7) 
preserving the general totally antisymmetric trilinear form. 

The restrictions on the starting values of n in these families are to avoid 
repeats arising from “accidental” isomorphisms. If we allow n = 1, 2,3, in 
each series, then C = D, = A,. This corresponds to sp(2,C) © s0(3,C) = 
s((2,C). Similarly, Dp = A, + A, corresponding to isomorphism SO(4) = 
SU(2) x SU(2)/Zs, while Cz = By implies that, locally, the compact Sp(2) 
SO(5). Finally, D3 = A3 implies that SU(4)/Z2 = SO(6). 


II | 


15.3.4 Product representations 


Given two representations AM and A® of g, we can form a new representa- 
tion that exponentiates to the tensor product of the corresponding represen- 
tations of the group G. Motivated by the result of exercise 14.13 


exp(A ® I, + Im ® B) = exp(A) ® exp(B), (15.134) 
we take the representation matrices to act on the tensor product space as 
AGS) — AM @ 72) 4 79 g A, (15.135) 
With this definition 
[AP®), APS) = (AP @ 19 +1 @AP), (AM @1@ 4.10 BAP) 
(AL), AM) @ 12 + (AM, 1] @ AP 
+A @ (7, A?) + 1 @ [AP AP} 
= (AM, AM) @1@ +71 @ [AP A), (15.136) 
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showing that the Ate?) obey the Lie algebra as required. 


This process of combining representations is analogous to the addition 
of angular momentum in quantum mechanics. Perhaps more precisely, the 
addition of angular momentum is an example of this general construction. 
If representation A") has weights m{”, ie. AM |m) = m\? mM), and A? 


, then, writing |m,m®)) for |m) @ |m), we have 


has weights m 


pee?) |m, m)) (An? @ 1 af 1 Q A?) Im, m?)) 


= (m) +m)|m,m) (15.137) 


so the weights appearing in the representation Ae) are m\ + m, 


The new representation is usually decomposible. We are familiar with 
this decomposition for angular momentum where, if 7 > 7’, 


787 = 97) 8047 —1) Oe G7): (15.138) 


This can be understood from adding weights. For example consider adding 
the weights of 7 = 1/2, which are m = +1/2 to those of 7 = 1, which are 
m = —1,0,1. We get m = —3/2, —1/2 (twice) +1/2 (twice) and m = 3/2. 
These decompose as shown in figure 15.6. 


o—-6}-©}-9 = e—e—e—o QM c—o 


Figure 15.6: The weights for 1/2 @1=3/2 61/2. 


The rules for decomposing products in other groups are more compli- 
cated than for SU(2), but can be obtained from weight diagrams in the same 
manner. In SU(3), we have, for example 


3@3 = 168, 
3@8 = 306@15, 
8@8 = 168680106104 27. (15.139) 


To illustrate the first of these we show, in figure 15.7 the addition of the 
weights in 3 to each of the weights in the 3. 
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Figure 15.7: Adding the weights of 3 and 3. 


The resultant weights decompose (uniquely) into the weight diagrams for the 
8 together with a singlet. 


15.3.5 Sub-algebras and branching rules 


As with finite groups, a representation that is irreducible under the full Lie 
group or algebra will in general become reducible when restricted to a sub- 
group or sub-algebra. The pattern of the decomposition is again called a 
branching rule. Here, we provide some examples to illustrate the ideas. 

The three operators Vz and hy = $A3 + V3Ag of su(3) form a Lie sub- 
algebra that is isomorphic to su(2) under the map that takes them to ox 
and 03 respectively. When restricted to this sub-algebra, the 8 dimensional 
representation of su(3) becomes reducible, decomposing as 


8=3626261, (15.140) 


where the 3, 2 and 1 are the j = 1, $ and 0 representations of su(2). 


We can visualize this decomposition as coming about by first projecting 
the (A3, As) weights to the “m” of the |j,m) labelling of su(2) as 


1 


and then stripping off the su(2) irreps as we did when decomposing product 
representions. 
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m=1 
m=1/2 
> = @ Sr) Po 
m=-1/2 


Figure 15.8: Projection of the su(3) weights on to su(2), and the decompo- 
sition 8=3@2 02901. 


This branching pattern occurs in the strong interactions, where the mass 
of the strange quark s being much larger than that of the light quarks wu and 
d causes the octet of pseudo-scalar mesons, which would all have the same 
mass if SU(3) flavour symmetry were exact, to decompose into the triplet of 
pions 7+, 7° and 7~, the pair K+ and K°, their antiparticles K~ and K°, 
and the singlet 7. 


There are obviously other su(2) sub-algebras consisting of {7., hr} and 
{Us, hy}, each giving rise to similar decompositions. These sub-algebras, 
and a continuous infinity of related ones, are obtained from the {V+, hy} 
algebra by conjugation by elements of SU(3). 


Another, unrelated, su(2) sub-algebra consists of 


O+ = V2(U,+T), 
V2(U_ ca rh, 
Qhy = (Ag + V3Ag). (15.142) 


Q 
| 
2 


3 
2 


The factor of two between the assignment 03 ~ hy of our previous example 
and the present assignment 03 ~ 2hy has a non-trivial effect on the branching 
rules. Under restriction to this new subalgebra, the 8 of su(3) decomposes 
as 


8=5@3, (15.143) 
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Figure 15.9: The projection and decomposition for 8 = 5 © 3. 


where the 5 and 3 are the 7 = 2 and j = 1 representations of su(2). A clue 
to the origin and significance of this sub-algebra is found by noting that the 
3 and 3 representations of su(3) both remain irreducible, but project to the 
same j = 1 representation of su(2). Interpreting this 7 = 1 representation 
as the defining vector representation of s0(3) suggests (correctly) that our 
new su(2) sub-algebra is the Lie algebra of the SO(3) subgroup of SU(3) 
consisting of SU(3) matrices with real entries. 


15.4 Further exercises and problems 


Exercise 15.25: A Lie group manifold G has the property that it is paralleliz- 
able. This term means that we can find a globally smooth basis for the tangent 
spaces. We can, for example, take the basis vectors to be the left-invariant 
fields L;. The existence of a positive-definite Killing metric also makes a com- 
pact Lie group into a Riemann manifold. In the basis formed from the L;, the 
metric tensor gj; = (L;, L;) is then numerically constant. 


We may use the globally-defined L; basis to define a connection and covariant 
derivative by setting V,,L; = 0. When we do this, the connection components 
w* 5 are all zero, as are all components of the Riemann curvature tensor. The 
connection is therefore flat. The individual vectors composing a vector field 
with position-independent components are therefore, by definition, parallel to 
each other. 


a) Show that this flat connection is compatible with the metric, but is not 
torsion free. 

b) Define a new connection and covariant derivative by setting Vz,L; = 
3[Li, L;|. Show that this new connection remains compatible with the 
metric but is now torsion free. It is therefore the Riemann connection. 
Compute the the components w* 5 of the new connection in terms of the 
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structure constants defined by [L;,L;] = —fij*L,. Similarly compute 
the components of the Riemann curvature tensor. 

c) Show that, for any constants a’, the parametrized curves g(t) = Exp(ta‘L;)g(0) 
are geodesics of the Riemann metric. 


Exercise 15.26: Campbell-Baker-Hausdorff Formule. Here are some useful 
formula for working with exponentials of matrices that do not commute with 
each other. 


a) Let X and Y be matrices. Show that 
1 
eXyve* ~V +4[X,Y]+ SUX [XY] ++, 


the terms on the right being the series expansion of exp|ad(tX)]Y. 
b) Let X and 6X be matrices. Show that 


1 
Bg Te 1+ f e '*§Xe'* dt + O [(6X)?| 
0 
1 1 
eS 1+ 5X — 5[X, 6X] + [X,[%, 6X] +--+ 0 [(6X)"] 
14 (tee | ox 4 oaxy 15.144 
oa aero 6 + O [(5X)?]. (15.144) 


c) By expanding out the exponentials, show that 


4 . 
ae a = ent eta renee. 


where “higher” means terms of higher order in X,Y. The next two terms 
are, in fact, 7[X, |X, Y]]+Q1Y, [Y, X]]. You will find the general formula 
in part d). 

d) By using the formula from part b), show that that e*e” can be written 


as e”, where 


1 
Zax +f gle) eA) )y de. 
0 


Here, 


a2) = 1 ne 


has a power-series expansion 


o(2) =14 5+ 5-1 + ale- st 
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which is convergent for |z| < 1. Show that g(e#4(*)e@4(")) can be ex- 
panded as a double power series in ad(X) and ad(tY), provided X and 
Y are small enough. This ad(X), ad(tY) expansion allows us to evaluate 
the product of two matrix exponentials as a third matrix exponential 
provided we know their commutator algebra. 


Exercise 15.27: SU(2) disentangling theorems: Almost any 2 x 2 matrix can 
be factored (a Gaussian decomposition) as 


(Sa) = (0 T)(0 n)(e 2). 


Use this trick to work the following problems: 
a) Show that 


exp {Se%a — %e)} = exp(ad,) exp(Ad3) exp(Ba_), 


where 64 = (01 £i62)/2, and 
a = e?tand/2, 
A = —Incos6/2, 
B@ = —e? tan 6/2. 


b) Use the fact that the spin-4 representation of SU(2) is faithful, to show 
that 


where Js pS J pce ide: Take care, the reasoning here is subtle! Notice 
that the series expansion of exponentials of G+ truncates after the second 
term, but the same is not true of the expansion of exponentials of the Te, 
You need to explain why the formula continues to hold in the absence of 
this truncation. 


Exercise 15.28: Recall that that the Lie algebra so(V) of the group SO(N) 
consists of the skew-symmetric N-by-N matrices A with entries A,, = —Ay,. 
Let J, W=1,...,N be the Dirac gamma matrices, and define ieee to be the 
Hermitian matrix 4[9,,4/]. Construct the skew-hermitian matrix P(A) from 
A by setting 


4 x 
I(A)=5 SS Aja! ws 
pV 
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and similarly construct [(B) and T'({A, B]) from the skew-symmetric matrices 
B and [A, B]. Show that 

[P(A), P(B)] = PIA, 8). 
Conclude that the map A — I(A) is a representation of so(N). 


Exercise 15.29: Invariant tensors for SU(3). Let \; be the Gell-Mann lambda 
matrices. The totally antisymmetric structure constants, fij,, and a set of 
totally symmetric constants, d;;,, are defined by 


LloARKR «A Poe. was fea 
Jaahk = 5 (AilAj, Akl); dijk = 5 (Ai{Aj, Ax}). 
In the second expression, the braces denote an anticommutator: 
def 
{x,y} = ey + ye. 


Let D¥, (g) be the matrices representing SU(3) in “8” — the eight-dimensional 
adjoint representation. 


a) Show that 


Figk = D5(9) Dim (9)Din (9) fimns 
dig = D¥(9)Dim(9)Din(9)dimns 


and so fij, and dij, are invariant tensors in the same sense that 6;; and 
€j,...i, are invariant tensors for SO(n). 

b) Let w; = fijzujvp- Show that if uj > D¥, (g)u; and y,— D¥.(g)vj, then 
Wi D¥, (g)w;. Similarly for w; = dj;,ujv,. (Hint: show first that the 
D® matrices are real and orthogonal.) Deduce that fj;, and dij, are 
Clebsh-Gordan coefficients for the 8 @ 8 part of the decomposition 


8Q@8=168686104 104 27. 


c) Similarly show that 6,g and the entries in the lambda matrices Ox)es 
can be regarded as Clebsch-Gordan coefficients for the decomposition 


3@3=168. 


d) Use the graphical method of plotting weights and peeling off irreps to 
obtain the tensor product decomposition in part b). 
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Chapter 16 


The Geometry of Fibre Bundles 


In earlier chapters we have used the language of bundles and connections, but 
in a relatively casual manner. We deferred proper mathematical definitions 
until now, because, for the applications we meet in physics, it helps to first 
have acquired an understanding of the geometry of Lie groups. 


16.1 Fibre bundles 


We begin with a formal definition of a bundle and then illustrate the defini- 
tion with examples from quantum mechanics. These allow us to appreciate 
the physics that the definition is designed to capture. 


16.1.1 Definitions 


A smooth bundle comprises three ingredients: EF, 7 and M, where FE and 
M are manifolds, and 7 : E — M is a smooth surjective (onto) map. The 
manifold E is the total space, M is the base space and m is the projection 
map. The inverse image 7~1(x) of a point in M (i.e., the set of points in E 
that map to x in M) is the fibre over x. 

We usually require that all fibres be diffeomorphic to some fixed manifold 
F. The bundle is then a fibre bundle, and F is “the fibre” of the bundle. In 
a similar vein, we sometimes also refer to the total space E’ as “the bundle.” 
Examples of possible fibres are vector spaces (in which case we have a vector 
bundle), spheres (in which case we have a sphere bundle), and Lie groups. 
When the fibre is a Lie group we speak of a principal bundle. A principal 
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bundle can be thought of the parent of various associated bundles, which are 
constructed by allowing the Lie group to act on a fibre. A bundle whose fibre 
is a one dimensional vector space is called a line bundle. 

The simplest example of a fibre bundle consists of setting EF’ equal to the 
Cartesian product M x F of the base space and the fibre. In this case the 
projection just “forgets” the point f € F', and so 7: (z,f)r a. 

A more interesting example can be constructed by taking MW to be the 
circle S! equipped with co-ordinate 6, and F as the one-dimensional interval 
I = [-1,1]. We can assemble these ingredients to make E into a Mobius 
strip. We do this by gluing the copy of J over 6 = 27 to that over 0 = 0 with 
a half twist so that the end —1 € [—1, 1] is attached to +1, and vice versa. 


+] —| 


Figure 16.1: Mobius strip bundle, together with a section @. 


A bundle that is a Cartesian product E = M x F, is said to be trivial. 
The Mobius strip is not a Cartesian product, and is said to be a twrsted 
bundle. The Mobius strip is, however, locally trivial in that for each 2 © M 
there is an open retractable neighbourhood U C M of x in which EF looks 
like a product U x F’. We will assume that all our bundles are locally trivial 
in this sense. If {U;} is a cover of M (i.e., if M =(U;) by such retractable 
neighbourhoods, and F ' is a fixed fibre, then a bundle can be assembled out 
of the collection of U; x F product bundles by giving gluing rules that identify 
points on the fibre over x € U; in the product U; x F' with points in the fibre 
over x € U; in U; x F for each x € U; 1 U;. These identifications are made 
by means of invertible maps yu,u,(z) : F — F that are defined for each x 
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in the overlap U; U;. The yu,y, are known as transition functions. They 
must satisfy the consistency conditions 


PU; (x) ae Identity, 
PUL; (x) _ duu, (x), 
pu, (£)pu,u,(2)yu,u, = Identity, 2 eU;NU;NU, #9. (16.1) 


A section of a fibre bundle (E72, M) is a smooth map @: M — E such 
that (x) lies in the fibre 7~!(x) over x. Thus 7 o ¢ = Identity. When the 
total space F is a product M x F this ¢ is simply a function @: M — F. 
When the bundle is twisted, as is the Mobius strip, then the section is no 
longer a function as it takes no unique value at the points 7 above which 
the fibres are being glued together. Observe that in the Mobius strip the 
half-twist forces the section ¢(a) to pass through 0 € [—1,1]. The Mobius 
bundle therefore has no nowhere-zero globally defined sections. Many twisted 
bundles have no globally defined sections at all. 


16.2 Physics examples 


We now provide three applications where the bundle concept appears in 
quantum mechanics. The first two illustrations are re-expressions of well- 
known physics. The third, the geometric approach to quantization, is perhaps 
less familiar. 


16.2.1 Landau levels 


Consider the Schrodinger eigenvalue problem 


1 (Pp b\ _ 
aise (S 4 a) = By (16.2) 


for a particle moving on a flat two-dimensional torus. We think of the torus 
as an L, x Ly, rectangle with the understanding that as a particle disappears 
through the right-hand boundary it immediately re-appears at the point with 
the same y co-ordinate on the left-hand boundary; similarly for the upper 
and lower boundaries. In quantum mechanics we implement these rules by 
imposing periodic boundary conditions on the wave function: 


VO,y) =Y(Le,y), — P(@, 0) = (a, Ly). (16.3) 
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These conditions make the wavefunction a well-defined and continuous func- 
tion on the torus, in the sense that after pasting the edges of the rectangle 
together to make a real toroidal surface the function has no jumps, and each 
point on the surface assigns a unique value to w. The wavefunction is a 
section of an untwisted line bundle with the torus as its base-space, the fi- 
bre over (x,y) being the one-dimensional complex vector space C in which 
(a, y) takes its value. 

Now try to carry out the same program for a particle of charge e moving in 
a uniform magnetic field B perpendicular to the x—y plane. The Schrodinger 
equation becomes 


2 2 
ae (> _ icA, yr _ (> - ied, w= Ey, (16.4) 
where (A,, A,) is the vector potential. We at once meet a problem. Although 
the magnetic field is constant, the vector potential cannot be chosen to be 
constant — or even periodic. In the Landau gauge, for example, where we set 
A, = 0, the remaining component becomes A, = Bx. This means that as the 
particle moves out of the right-hand edge of the rectangle representing the 
torus we must perform a gauge transformation that prepares it for motion in 
the (A,, A,) field it will encounter when it reappears at the left. If equation 
(16.4) holds, then it continues to hold after the simultaneous change 


(ay) SS ea ary) 
—ieA, — —ieA,+ eBay S etieBlat = —ie(A,— BL,). (16.5) 
y 


At the right-hand boundary x = L, this gauge transformation resets the 
vector potential A, back to its value at the left-hand boundary. Accordingly, 
we modify the boundary conditions to 


YO,y) =e Py(Le,y), (a, 0) = ¥(z, Ly). (16.6) 


The new boundary conditions make the wavefunction into a section! of a 


twisted line bundle over the torus. The fibre is again the one-dimensional 
complex vector space C. 


‘That the wave “function” is no longer a function should not be disturbing. 
Schrodinger’s 7 is never really a function of space-time. Seen from a frame moving at 
velocity v, u(x,t) acquires factor of exp(—imvx — mv?t/2), and this is no way for a self- 
respecting function of x and t to behave. 
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We have already met the language in which the gauge field —ieA,, is a 
called connection on the bundle, and the associated ieB field is the curvature. 
We will explain how connections fit into the formal bundle language in section 
16.3. 

The twisting of the boundary conditions by the gauge transformation 
seems innocent, but within it lurks an important constraint related to the 
consistency conditions in (16.1). We can find the value of ¢(L,, L,) from that 
of (0,0) by using the relations in (16.6) in the order ~(0,0) — W(0, Ly) - 
w(Lz, Ly), or in the order W(0,0) — o(Lz,0) — o(Lz, Ly). Since we must 
obtain the same ~(L,,, Ly) whichever route we use, we need to satisfy the 
condition 

eae (16.7) 


This tells us that the Schrodinger problem makes sense only when the mag- 
netic flux BL,L, through the torus obeys 


eBL,L, = 2nN (16.8) 


for some integer N. We cannot continuously vary the flux through a fi- 
nite torus. This means that if we introduce torus boundary conditions as a 
mathematical convenience in a calculation, then physical effects may depend 
discontinuously on the field. 

The integer N counts the number of times the phase of the wavefunction 
is twisted as we travel from (x,y) = (Lz,0) to (a, y) = (Lz, Ly) gluing the 
right-hand edge wavefunction to back to the left-hand edge wavefunction. 
This twisting number is a topological invariant. We have met this invariant 
before, in section 13.6. It is the first Chern number of the wavefunction 
bundle. If we permit B to become position dependent without altering the 
total twist N, then quantities such as energies and expectation values can 
change smoothly with B. If N is allowed to change, however, the these 
quantities may jump discontinuously. 

The energy F = E,, solutions to (16.4) with boundary conditions (16.6) 
are given by 


Vi g(x, y) = 2 Wn (« = B a pl) el(eBpleth)y (16.9) 
Here, W(x) is a harmonic-oscillator wavefunction obeying 
TL <dabe .- 
5 ¥ a =m Vn = Uns (16.10) 


2m dx? 2 
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with w = eB/m the classical cyclotron frequency, and E,, = w(n+1/2). The 
parameter k takes the values 27q/L, for q an integer. At each energy E,, we 
obtain N independent eigenfunctions as q runs from 1 to eBL,L,/27. These 
N-fold degenerate states are the Landau levels. The degeneracy, being of 
necessity an integer, provides yet another explanation for why the flux must 
be quantized. 


16.2.2 The Berry connection 


Suppose we are in possession of a quantum-mechanical hamiltonian H (€) de- 
pending on some parameters € = (£1, €?,...) € M, and know the eigenstates 
\n;€) that obey 


H(€)|n;€) = E,(€)In; 8). (16.11) 


If, for fixed n, we can find a smooth family of eigenstates |n;&), one for 
every € in the parameter space M, we have a vector bundle over the space 
M. The fibre above € is the one-dimensional vector space spanned by |n; €). 
This bundle is a sub-bundle of the product bundle M x H where # is the 
Hilbert space on which H acts. Although the larger bundle is not twisted, 
the sub-bundle may be. It may also not exist: if the state |n;€) becomes 
degenerate with another state |m;&) at some value of €, then both states 
can vary discontinuously with the parameters, and we wish to exclude this 
possibility. 

In the previous paragraph we considered the evolution of the eigenstates 
of a time-independent Hamiltonian as we varied its parameters. Another, 
more physical, evolution is given by solving the time-dependent Schrodinger 
equation 


iO|b(t)) = H(E(t))|Y)) (16.12) 


so as to follow the evolution of a state |¢(t)) as the parameters are slowly var- 
ied. If the initial state |w(0)) coincides with with the eigenstate |0,€(0)), and 
if the time evolution of the parameters is slow enough, then |W) is expected to 
remain close to the corresponding eigenstate |0; €(t)) of the time-independent 
Schrédinger equation for the hamiltonian A(€(t)). To determine exactly how 
“close” it stays, insert the expansion 


pC) = San(O ns) exp {if eacgco ae} (16.13) 
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into (16.12) and take the inner-product with |m;&). For m 4 0, we expect 
that the overlap (m;€|w(t)) will be small and of order O(0€/0t). Assuming 
that this is so, we read off that 


| ag 

dig + ag (0; €|0,,|0; ohne = 0. Ge =30) (16.14) 
_ (mm; €10,10; €) OF" 

Am = a Oy (m # 0) (16.15) 


up to first-order accuracy in time-derivatives of the |n; €(t)). Hence, 


_ 5 iyBerry (t) ; |m; €)(m; €|0,,|0; €) OG Lo -i sf? Eo(t)at 
jo(t)) = een {ve +a Heres x + \. sola, 
(16.16) 
where the dots refer to terms of higher order in time-derivatives. 

Equation (16.16) constitutes the first two terms in a systematic adiabatic 
series expansion. The factor ao(t) = exp{i7perry(t)} is the solution of the 
differential equation (16.14). The angle 7perry is known as Berry’s phase, after 
the British mathematical physicist Michael Berry. It is needed to take up the 
slack between the arbitrary €-dependent phase-choice at our disposal when 
defining the |0; €), and the specific phase selected by the Schrédinger equation 
as it evolves the state |W(t)). Berry’s phase is also called the geometric phase 
because it depends only on the Hillbert-space geometry of the family of states 
|0;€), and not on their energies. We can write 


at Ogt 
reney(t) =i fF (0:$12p10:8) Sa (16.17) 
0 
and regard the one-form 
Apeny = (0; €|0,|0; €)dé" = (0; €|d|0; €) (16.18) 


as a connection on the bundle of states over the space of parameters. The 
equation 


O 
ce a a Ava) 2) = (16.19) 


then identifies the Schrodinger time evolution with parallel transport. It 
seems reasonable to refer to this particular form of parallel transport as 
“Berry transport.” 


654 CHAPTER 16. THE GEOMETRY OF FIBRE BUNDLES 


In order for corrections to the approximation |y(t)) ~ (phase)|0; €(t)) to 
remain small, we need the denominator (F,, — Eo) to remain large when 
compared to its numerator. The state that we are following must therefore 
never become degenerate with any other state. 


Monopole bundle 


Consider, for example a spin-1/2 particle in a magnetic field. If the field 
points in direction n, the Hamiltonian is 


H(n) = p|Ble-n (16.20) 


There are are two eigenstates, with energy F. = +y|B|. Let us focus on 
the eigenstate |~,), corresponding to E,. For each n we can obtain an Ey, 
eigenstate by applying the projection operator 


P=sa+n-a)=5( 0 — ) (16.21) 


to almost any vector, and then multiplying by a real normalization constant 
N. Applying P to a “spin-up” state, for example gives 


N5U deg) @ _ Ca (16.22) 


Here, 6 and ¢ are spherical polar angles on S? that specify the direction of 
n. 

Although the bundle of EF = FE, eigenstates is globally defined, the family 
of states }o? (n)) that we have obtained, and would like to use as base for 
the fibre over n, becomes singular when n is in the vicinity of the south pole 
6 =. This is because the factor e’® is multivalued at the south pole. There 
is no problem at the north pole because the ambiguous phase e*® multiples 
sin 0/2, which is zero there. 

Near the south pole, however, we can project from a “spin-down” state 
to find 


Iw?’ (n)) =N5(+n-8) @ = eae (16.23) 


This family of eigenstates is smooth near the south pole, but is ill-defined at 
the north pole. As in section 13.6, we are compelled to cover the sphere S$? 


16.2. PHYSICS EXAMPLES 655 


by two caps, D, and D_, and use [y?) in D, and [p? in D_. The two 
families are related by 


[oO (ny) = e% |b (n)) (16.24) 


in the cingular overlap region D, M D_. Here, e® is the transition function 
that glues the two families of eigenstates together. 
The Berry connections are 


AD = (yMlaip) = “(cos — 1)dd 


AD = (ply) = “(cos + 1)d¢. (16.25) 


In their common domain of definition, they are related by a gauge transfor- 
mation: 
A® = A® + idd. (16.26) 


The curvature of either connection is 
dA = -5 sin 0d0d@ = —54(Area). (16.27) 


Being the area two-form, the curvature tells us that when we slowly change 
the direction of B and bring it back to its original orientation the spin state 
will, in addition to the dynamical phase exp{—iE,t}, have accumulated a 
phase equal to (minus) one-half of the area enclosed by the trajectory of n 
on $?. The two-form field dA can be though of as the flux of a magnetic 
monopole residing at the centre of the sphere. The corresponding bundle of 
one-dimensional vector spaces, spanned by |7,(n)), over n € S? is therefore 
called the monopole bundle. 


16.2.3. Quantization 


In this section we provide a short introduction to geometric quantization. 
This idea, due largely to Kirilov, Kostant and Souriau, extends the famil- 
iar technique of canonical quantization to phase spaces with more structure 
than that of the harmonic oscillator. We illustrate the formalism by quan- 
tizing spin, and show how the resulting Hilbert space provides an example of 
the Borel-Weil-Bott construction of the representations of a semi-simple Lie 
group as spaces of sections of holomorphic line bundles. 
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Prequantization 


The passage from classical mechanics to quantum mechanics involves re- 
placing the classical variables by operators in such a way that the classical 
Poisson-bracket algebra is mirrored by the operator commutator algebra. In 
general, this process of quantization is not possible without making some 
compromises. It is, however, usually possible to pre-quantize a phase-space 
and its associated Poisson algebra. 

Let M be a 2n-dimensional classical phase-space with its closed symplec- 
tic form w. Classically a function f : M — R give rise to a Hamiltonian 
vector field v¢ via Hamilton’s equations 


df = —iy,w. (16.28) 


We saw in section 11.4.2 that the closure condition dw = 0 ensures that that 
the Poisson bracket 


{f, g}=usg = w(vy, Vy) (16.29) 

obeys 
[vps Ug] = UC F93- (16.30) 
Now suppose that the cohomology class of (27h)~'w in H?(M,R) has the 
property that its integrals over cycles in H2(M, Z) are integers. Then (it can 


be shown) there exists a line bundle L over M with curvature F = —ih71w. 
If we locally write w = dn, where 7 = n,dx", then the connection one-form 
is A = —ih~'n and the covariant derivative 

Ve = v"(O, — ih ny), (16.31) 


acts on sections of the line bundle. The corresponding curvature is 
F(u, v)=[Vus Vol — Viwsy = —th ‘wu, v). (16.32) 


We define a pre-quantized operator p(f) that, when acting on sections 
W(x) of the line bundle, corresponds to the classical function f: 


af) = -ihV., +f. (16.33) 
For hamiltonian vector fields vp and vg we have 


AV, tif, Vogl = AV iw5,vg) — (VF, Vg) + Af, Vogl 
AV (ws,v9] — tbo, + df) (Vy) 
= AV yay) (16.34) 
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and so 


[-ihV., + f, -ihV., +9] = PV p,0 — thoy 
—iA( AAV wy .vq) + LF, 95) 
= -ih(-ihV.,,,, +{f,9})- (16.35) 


Equation (16.35) is Dirac’s quantization rule: 


ila(f), olg)] = hats, g})- (16.36) 


The process of quantization is completed, when possible, by defining a 
polarization. This is a restriction on the variables that we allow the wave- 
functions to depend on. For example, if there is a global set of Darboux 
co-ordinates p,q we might demand that the wavefunction depend only on q, 
only on p, or only on the combination p+iq. Such a restriction is necessary so 
that the representation f +> p(f) be irreducible. As globally defined Darboux 
co-ordinates do not usually exist, this step is the hard part of quantization. 

The general definition of a polarized section is rather complicated. We 
sketch it here, but give a concrete example in the next section. We begin 
by observing that, at each point 7 € M, the symplectic form defines a skew 
bilinear form. We seek a Lagrangian subspace of V, C TM, for this form. A 
Lagrangian subspace is one such that V, = V,;+. For example, if 


w = dp, \ dq + dpz \ dqz 
1 ; 
= = {d(p1 — igi) A d(pi + ign) + d(p2 — iga) A d(p2 + ig2)} 


2i 
(16.37) 


then the space spanned by the 0O,’s is Lagrangian, as is the space spanned 
by the O,’s, and the space spanned by the 0,,;,’s. In the last case, we have 
allowed the coefficients of the vectors in V, to be complex numbers. Now we 
let x vary and consider the distribution defined by the vector fields spanning 
the V,’s. We require this distribution to be globally integrable so that the 
V, are the tangent spaces to a global foliation of I/. With these ingredients 
at hand, we declare a section W of the line bundle to be polarized if VeV = 0 
for all € € V,. Here, € is the vector field whose components are the complex 
conjugates of those in €. 

We define an inner product on the space of polarized sections by using the 
Liouville measure w"/n! on the phase space. The quantum Hilbert space then 
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consists of finite-norm polarized sections of the line bundle. Only classical 
functions that give rise to polarization-compatible vector fields will have their 
Poisson-bracket algebra coincide with the quantum commutator algebra. 


Quantizing spin 


To illustrate these ideas, we quantize spin. The classical mechanics of spin 
was discussed in section 11.4.2. There we showed that the appropriate phase 
space is the 2-sphere equipped with a symplectic form proportional to the 
area form. Here we must be specific about the constant of proportionality. 
We choose units in which h = 1, and take w = jd(Area). The integrality of 
w/2m requires that 7 be an integer or half integer. We will assume that 7 is 
positive. 

We parametrize the 2-sphere using complex sterographic co-ordinates z, 
Z which are constructed similarly to those in section 12.4.3. This choice will 
allow us to impose a natural complex polarization on the wavefunctions. In 
contrast to section 12.4.3, however, it is here convenient to make the point 
z = 0 correspond to the south pole, so the polar co-ordinates 6, @, on the 
sphere are related to z, Z via 


2 
—1 
Cos) = i aes) 
zle+1 
22 
id ot Oe 
e’® sin eae a? 
“ibn = 22 
e“sind = ~ a7: (16.38) 


In terms of the z, Z co-ordinates the symplectic form is given by 


207 2 


As long as we avoid the north pole, where z = oo, we can write 


_2dzZ—Zzdz 


and so the local connection form has components proportional to 


z z 
, 4 > Zo 4) < 16.41 
Meeps oe) 
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The covariant derivatives are therefore 

O 2 O Zz 

—-j——, Ve=~—+ji—a. 

dz 7 |z2 +1 dz 7 |z2+1 
We impose the polarization condition that Ve¥ = 0. This condition 

requires the allowed sections to be of the form 


W(z,Z) = (14 |z|*)3eb(2), (16.43) 


V.= (16.42) 


where w depends only on z, and not on Z. It is natural to combine the 
(1 + |z|?)~ prefactor with the Liouville measure so that the inner product 


becomes 5s 2 ne 
oy) ZN\dZ —— 
= me : 16.44 
The normalizable wavefunctions are then polynomials in z of degree less than 
or equal to 27, and a complete orthonormal set is given by 


24! 


nie 


Vm(z) = p° Sp SS. (16.45) 


We desire to find the quantum operators p(J;) corresponding to the com- 
ponents 


J, =jsinédcos¢, Jg=jsindsingd, J3 = jcos8, (16.46) 


of a classical spin J of magnitude 7, and also to the ladder-operator compo- 
nents J, = J; +iJo. In our complex co-ordinates, these functions become 
eal 

22 
2/2? +17 
22 
2/2 +1 


J3 = J 


| 


ds, a 


(16.47) 


Also in these co-ordinates, Hamilton’s equations dH = —w(vy, ) take the 
form 


(16.48) 


Xl 
I 
| 
~ 


Oz 
(1+ |z)2)2 0H 
z 
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and the Hamiltonian vector fields corresponding to the classical phase space 
functions J3, J; and J_ are 


Uj, = 120, — 1203, 
vj, = i270, — idz, 
vy = 10, + 42x. (16.49) 


Using the recipe (16.33) for p() from the previous section, together with 
the fact that V=V = 0, we find, for example, that 


P(I4)(1 + |2|?) 7 (2) 


O 
= (1+ |z|*)~ -23 + die w (16.50) 
It is natural to define operators 
He (1+ leP YA) + le?) (16.51) 


that act only on the z-polynomial part ~(z) of the section V(z,Z). We then 
have 


+ 0 
J, = —2P— + 27jz. 16.52 
+= ea t diz (16.52) 
Similarly, we find that 
~ 0 
De ee ee 16.53 
Oz’ ( ) 
(ae ee (16.54) 
3 = Oz . . 
These operators obey the su(2) Lie-algebra relations 
(J, Jz] = tds, 
PGT = 2; (16.55) 


and act on the w,,(z) monomials as 


Isthm(2) = MYm(Z), 
J4Vm(Z) VIG +1) — m(m + 1) dns (z)- (16.56) 


This is the familiar action of the su(2) generators on |j,m) basis states. 


Exercise 16.1: Show that with respect to the inner product (16.44) we have 
i = Js, a i 4 ae 


- (Z - ata) r a (1+ [zP)*U(2), 
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Coherent states and the Borel-Weil-Bott theorem 


We now explain how the spin wavefunctions w,,(z) can be understood as 
sections of a holomorphic line bundle. 

Suppose that we have a compact Lie group G and a unitary irreducible 
representation g € G++ D/(g). Let |0) be the normalized highest (or lowest) 
weight state in the representation space. Consider the states 


Ig) = D7(g)|0),  (g| = (0| [D7(g)]'. (16.57) 


The |g) compose a family of generalized coherent states.? There is a contin- 
uous infinity of the |g), and so they cannot constitute an orthonormal set on 
the finite-dimensional representation space. The matrix-element orthogonal- 
ity property (15.81), however, provides us us with a useful over-completeness 


relation: ai 
im( 
= 16. 
VolG > [ans \9) th Pe) 


The integral is over all of G, but many points in G give the same contribution. 
The maximal torus, denoted by T’, is the abelian subgroup of G obtained by 
exponentiating elements of the Cartan algebra. Because any weight vector is 
a common eigenvector of the Cartan algebra, elements of T leave |) fixed up 
to a phase. The set of distinct |g) in the integral can therefore be identified 
with G/T. This coset space is always an even-dimensional manifold, and 
thus a candidate phase space. 

Consider, in particular, the spin-j representation of SU(2). The coset 
space G/T is then SU(2)/U(1) ~ $?. We can write a general element of 
SU(2) as 


U = exp(ZJ+) exp(6J3) exp(yJ_) (16.59) 


for some complex parameters Z, @ and y which are functions of the three real 
co-ordinates that parameterize SU(2). We let U act on the lowest-weight 
state |j, —j). The rightmost factor has no effect on the lowest weight state, 
and the middle factor only multiplies it by a constant. We therefore restrict 
our attention to the states 


IZ) = exp(ZJ4)|9,-3), (21 = G, —jl exp(zJ_) = (12). (16.60) 


2A. Perelomov, Generalized Coherent States and their Applications (Springer-Verlag, 
Berlin 1986). 
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These states are not normalized, but have the advantage that the (z| are 
holomorphic in the parameter z — 7.e., they depend on z but not on 7Z. 

The set of distinct |Z) can still be identified with the 2-sphere, and z, 7 
are its complex sterographic co-ordinates. This identification is an example 
of a general property of compact Lie groups: 


G/T  Ge/B,. (16.61) 


Here, Gc is the complexification of G — the group G, but with its parameters 
allowed to be complex — and B, is the Borel group whose Lie algebra consists 
of the Cartan algebra together with the step-up ladder operators. 

The inner product of two |Z) states is 


(2/|z) = (1 + 22"), (16.62) 
and the eigenstates |j,m) of J? and J3 possess coherent state wavefunctions: 


24 


GaGa germ (16.63) 


WO (z) = (z|3,m) = 


We recognize these as our spin wavefunctions from the previous section. 
The over-completeness relation can be written as 


2j+1 dz \ dz 
I = —— | — 1. [2% 16.64 
Qri / (1+ Zz)2s+2 ENA ( ) 


and provides the inner product for the coherent-state wavefunctions. If 


We) ew and OA) = el then 
wh) = 7 ee Wen) 


27% 1+ 22) 
2j +1 dz A dz 
Oni i (i zaps MOU), (16.65) 


which coincides with (16.44). 

The wavefunctions ww (2) are singular at the north pole, where z = oo. 
Indeed, there is no actual state (co| because the phase of this putative limiting 
state would depend on the direction from which we approach the point at 
infinity. We may, however, define a second family of coherent states: 


Co = exp(CF)li,9), 2(Cl = G, jl exp(CJ4), (16.66) 
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and form the wavefunctions 


W)C) = a(€|j,m). (16.67) 


These new states and wavefunctions are well defined in the vicinity of the 
north pole, but singular near the south pole. 

To find the relation between W®(¢) and yw (z) we note that the matrix 
identity 


I: “ E ees ie le al (16.68) 


coupled with the faithfulness of the spin-$ representation of SU(2), implies 
the relation 


® exp(zJ,) = exp (—z7' J_)(—z)?”8 exp (z7"J,), (16.69) 
where W = exp(—imJ2). We also note that 
(,5|@ = (-1)"(9,-3],  G -I1@ = GIL (16.70) 
Thus, 


po (2) 


| 
ae 
S. 

| 
Ss. 

o 


= (—1)7(j, j)@e??- 3, m) 


= 1745, le (ze |, m) 


= (-1)4(—2)4(j, jle* 7+ |j,m) 
242) (2-4), (16.71) 


The transition function 2% that relates J (z) to W2’ (¢ = 1/z) depends only 
on z. We therefore say that the wavefunctions Wy’ (z) and pe (¢) are the local 
components of a global section W, < |j,m) of a holomorphic line bundle. 
The requirement that the transition function and its inverse be holomorphic 
and single valued in the overlap of the z and ¢ co-ordinate patches forces 27 
to be an integer. The w™,, form a basis for the space of global holomorphic 
sections of this bundle. 

Borel, Weil and Bott showed that any finite-dimensional representation of 
a semi-simple Lie group G can be realized as the space of global holomorphic 
sections of a line bundle over Gc/B,. This bundle is constructed from the 
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highest (or lowest) weight vectors in the representation by a natural general- 
ization of the method we have used for spin. This idea has been extended by 
Witten and others to infinite-dimensional Lie groups, where it can be used, 
for example, to quantize two-dimensional gravity. 


Exercise 16.2: Normalize the states |Z), (z|, by multiplying them by N = (1 + |z|?)~’. 
Show that 


_ .f2l2?—1 
N*2|Jal2) = Se 

= . 2 
N*(2lJ412) = Fa 

= . 2 
NII) =a 


thus confirming the identification of z, Z with the complex stereographic co- 
ordinates on the sphere. 


16.3. Working in the total space 


We have mostly considered a bundle to be a collection of mathematical ob- 
jects and a base space to which they are attached, rather than treating the 
bundle as a geometric object in its own right. In this section we demonstrate 
the advantages to be gained from the latter viewpoint. 


16.3.1 Principal bundles and associated bundles 


The fibre bundles that arise in a gauge theory with Lie group G are called 
principal G-bundles, and the fields and wavefunctions are sections of associ- 
ated bundles. A principal G-bundle comprises the total space, which we here 
call P, together with the projection 7 to the base space M. The fibre can be 
regarded as a copy of G, 1,e,, 


m:P—>M, n(x) &G. (16.72) 


Strictly speaking, the fibre is only required to be a homogeneous space on 
which G acts freely and transitively on the right; x — xg. Such a set can 
be identified with G after we have selected a fiducial point fo € F' to be 
the group identity. There is no canonical choice for fo and, if the bundle is 
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twisted, there can be no globally smooth choice. This is because a smooth 
choice for fp in the fibres above an open subset U C M makes P locally 
into a product U x G. Being able to extend U to the entirety of M means 
that P is trivial. We will, however, make use of local assignments fp +> e 
to introduce bundle co-ordinate charts in which P is locally a product, and 
therefore parametrized by ordered pairs (x,g) with x € U and g EG. 

To understand the bundles associated with P, it is simplest to define the 
sections of the associated bundle. Let y;(z,g) be a function on the total 
space P with a set of indices i carrying some representation g ++ D(g) of 
G. We say that y;(x,g) is a section of an associated bundle if it varies in a 
particular way as we run up and down the fibres by acting on them from the 
right with elements of G; we require that 


viz, gh) = Dij(h™)y;(2, 9). (16.73) 


These sections can be thought of as wavefunctions for a particle moving in 
a gauge field on the base space. The choice of representation D plays the 
role of “charge,” and (16.73) are the gauge transformations. Note that we 
must take h~! as the argument of D in order for the transformation to be 
consistent under group multiplication: 
yi(x, ghihg) ig(hg*)ps(2, ghar) 

ig (hy oe (hy )er(a, 9) 

(hy 

( 


SSS 


ik 1 Pax, 9) 
ik ake ‘ye (2, 9). (16.74) 


The construction of the associated bundle itself requires rather more ab- 
straction. Suppose that the matrices D(g) act on the vector space V. Then 
the total space Py of the associated bundle consists of equivalence classes 
of P x V under the relation ((x,g),v) ~ ((x,gh), D(h7)v) for all v € V, 
(x,g) € Pandh€G. The set of G-action equivalence classes in a Cartesian 
product A x B is usually denoted by A xg B. Our total space is therefore 


Py =Px@V. (16.75) 


We find it conceptually easier to work with the sections as defined above, 
rather than with these equivalence classes. 
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16.3.2 Connections 


A gauge field is a connection on a principal bundle. The formal definition of 
a connection is a decomposition of the tangent space T’P, of P at p € P into 
a horizontal subspace H,(P) and a vertical subspace V,(P). We require that 
V,(P) be the tangent space to the fibres and H,(P) to be a complementary 
subspace, 2.e., the direct sum should be the whole tangent space 


TP, = H,(P) ®V,(P). (16.76) 


The horizontal subspaces must also be invariant under the push-forward 
induced from the action on the fibres from the right of a fixed element 
of G. More formally, if Rig] : P — P acts to take p — pg, i.e. by 
Rig|(x, 9’) = (x, g'9g) — we require that 


Rlg|»Hp(P) = Apg(P). (16.77) 


Thus, we get to chose one horizontal subspace in each fibre, the rest being 
determined by the right-invariance condition. 

We now show how this geometric definition of a connection leads to 
parallel-transport. We begin with a curve x(t) in the base space. By solving 


the equation 
Oe 


we can lift lift the curve x(t) to a new curve (z(t), g(t)) in the total space, 
whose tangent is everywhere horizontal. This lifting operation corresponds 
to parallel transporting the initial value g(0) along the curve z(t) to get 
ge) TheA, = ira At are a set of Lie-algebra-valued functions that are 
determined by our choice of horizontal subspace. They are defined so that 
the vector (dx, —A,,d2"g) is horizontal for each small displacement 6x” in the 
tangent space of M. Here, —A,,d2"g is to be understood as the displacement 
that takes g — (1 — A,dx")g. Because we are multiplying A in from the 
left, the lifted curve can be slid rigidly up and down the fibres by the right 
action of any fixed group element. The right-invariance condition is therefore 
automatically satisfied. 
The directional derivative along the lifted curve is 


a 
iD, = (4) “4 ARs) (16.79) 
g 
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where R, is a right-invariant vector field on G, 7.e., a differential operator on 
functions defined on the fibres. The D,, are a set of vector fields in TP. These 
covariant derivatives span the horizontal subspace at each point p € P, and 
have Lie brackets 

[Du Py] = —Fyiy Ra. (16.80) 


Here, F,,,, is given in terms of the structure constants appearing in the Lie 
brackets [Ra, Ry] = f5,Re by 
C c c c fa 4b 
Fv = OAs — OA, — fav ApAr- (16.81) 
We can also write 


F ww i OwAr _ WA he [An A,]. (16.82) 


where F, = idaFe, and Ds ro] = 4 fe Nie 

Because the Lie bracket of the D,, is a linear combination of the Ra, it lies 
entirely in the vertical subspace. Consequently, when F,,, 4 0 the D,, are not 
in involution, so Frobenius’ theorem tells us that the horizontal subspaces 
cannot fit together to form the tangent spaces to a smooth foliation of P. 

We now make contact with the more familiar definition of a covariant 
derivative. We begin by recalling that right invariant vector fields are deriva- 
tives that involve infinitesimal multiplication from the left. Their definition 
is i 

Raiea(e,g) = lim = (pi(a, (1+ t€Xa)g) — eile, 9)) (16.83) 


where Das ro] =i fe Ne 
As ;(x,g) is a section of the associated bundle, we know how it varies 
when we multiply group elements in on the right. We therefore write 


(1+ ieda)g = 9g ‘(1s ieda) 9, (16.84) 


and from this, (and writing g for D(g) where it makes for compact notation) 
we find 


Ragi(e,g) = lim (Dij(g-"(1  teXa)a)ei(e, 9) — vile, 9)) /¢ 
= —Dij(G")(a)jeDualgeule, 9) 


= -i(g Mag)ig Pj. (16.85) 
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Here, i(\a)is is the matrix representing the Lie algebra generator Da in the 
representation g +> D(g). Acting on sections, we therefore have 


Due = (Ou'P)g + (G'Aug)e. (16.86) 


This still does not look too familiar, because the derivatives with respect to 
x, are being taken at fixed g. We normally fiz a gauge by making a choice of 
g = 0(«) for each x,,. The conventional wavefunction v(x) is then p(x, o(x)). 
We can use v(x, 0(x)) = 0 }(x)p(z, e), to obtain 


On” = (Opp), + (0,0 *) oy = (On%), — (10,0) Y. (16.87) 
From this, we get a derivative 


Vi = Oj, + (Ayo 1b a '0,0) = On a Ay: eee) 


on functions y(z) 2 y(x,a(x)) defined (locally) on the base space M. This 
is the conventional covariant derivative, now containing gauge fields 


A,(x) =07'A,o +07'0,0 (16.89) 


that are gauge transformations of our g-independent A,,. The derivative has 
been constructed so that 


Vup(x) = Du PZ, 9)\q—o(x) + (16.90) 
and has commutator 
[Vu Vi] = Oo Fee = Pipi (16.91) 


Note the sign change vis-d-vis equation (16.80). 
It is the curvature tensor F',,, that we have met previously. Recall that it 
provides a Lie-algebra-valued two-form 


1 
F= 5 Fudatda” =dA+ A? (16.92) 


on the base space. The connection A = A,,dx" is a one-form on the base 
space, and both F’ and A have been defined only in the region U C M in 
which the smooth gauge-choice section a(x) has been selected. 
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16.3.3. Monopole harmonics 


The total-space operations and definitions in these sections may seem rather 
abstract. We therefore demonstrate their power by solving the Schrodinger 
problem for a charged particle confined to a unit sphere surrounding a mag- 
netic monopole. The conventional approach to this problem involves first 
selecting a gauge for vector the potential A, which, because of the monopole, 
is necessarily singular at a Dirac string located somewhere on the sphere, 
and then delving into properties of Gegenbauer polynomials. Eventually we 
find the gauge-dependent wavefunction. By working with the total space, 
however, we can solve the problem in all gauges at once, and the problem 
becomes a simple exercise in Lie-group geometry. 

Recall that the SU(2) representation matrices D/,,(0,¢,w) form a com- 
plete orthonormal set of functions on the group manifold $%. There will be a 
similar complete orthonormal set of representation matrices on the manifold 
of any compact Lie group G. Given a subgroup H C G, we will use these 
matrices to construct bundles associated to a principal H-bundle that has G 
as its total space and the coset space G/H as its base space. The fibres will 
be copies of H, and the projection 7 the usual projection G — G/H. 

The functions D/(g) are not in general functions on the coset space G'/H 
as they depend on the choice of representative. Instead, because of the 
representation property, they vary with the choice of representative in a well- 
defined way: 

D1, (gh) = Di.y(g)Dig(h). (16.93) 


mn’ 


Since we are dealing with compact groups, the representations can be taken 
to be unitary and therefore 


[Din(9h)]* = [Dir (9)]* [Dein (h)]* (16.94) 
Dea h )Diealg)l*. (16.95) 


nn’! mn’ 


This is the correct variation under the right action of the group H for the 
set of functions [DZ,,,(gh)|* to be sections of a bundle associated with the 
principal fibre bundle G — G/H. The representation h++ D(h) of H is not 
necessarily that defined by the label J because irreducible representations of 
G may be reducible under H; D depends on what representation of H the 
index n belongs to. If D is the identity representation, then the functions 
are functions on G/H in the ordinary sense. For G = SU(2) and H being 
the U(1) subgroup generated by J3, the quotient space is just S?, and the 
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projection is the Hopf map: S$? — S?. The resulting bundle can be called 
the Hopf bundle. It is not a really new object, however, because it is a gen- 
eralization of the monopole bundle of the preceding section. Parameterizing 
SU(2) with Euler angles, so that 


DI (0:0, = CL mle Oe ee YET, a), (16.96) 
shows that the Hopf map consists of simply forgetting about w, so 
Hopf : [(8,4,¥) € S°] + ((6,¢) € $4). (16.97) 


The bundle is twisted because $* is not a product S?xS'. Taking n = 0 gives 
us functions independent of w, and we obtain the well-known identification 
of the spherical harmonics with representation matrices 


¥i(0,) = |" * (pe, ¢,0)F. (16.98) 


For n = A £ 0 we get sections of a bundle whose Chern number is 2A. These 
sections are the monopole harmonics: 


2J+1 
Via 8,6,0) = J [Dia(6, 6, ¥) (16.99) 


for a monopole of flux [ eBd(Area) = 4rA. The integrality of the Chern 
number tells us that the flux 47A must be an integer multiple of 27. This 
gives us a geometric reason for why the eigenvalues m of J3 can only be an 
integer or half integer. 

The monopole harmonics have a non-trivial dependence, « e’”“, on the 
choice we make for w at each point on S?, and we cannot make a globally 
smooth choice; we always encounter at least one point where there is a singu- 
larity. Considered as functions on the base space, the sections of the twisted 
bundle have to be constructed in patches and glued together using transition 
functions. As functions on the total space of the principal bundel, however, 
they are globally smooth. 

We now show that the monopole harmonics are eigenfunctions of the 
Schrodinger operator —V? containing the gauge field connection, just as the 
spherical harmonics are eigenfunctions of the Laplacian on the sphere. This 
is a simple geometrical exercise. Because they are irreducible representations, 
the D/(g) are automatically eigenfunctions of the quadratic Casimir operator 


(JP tds +52) D" (o) SF +1) Dg). (16.100) 
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The J; can be either right or left-invariant vector fields on G; the quadratic 
Casimir is the same second-order differential operator in either case, and it 
is a good guess that it is proportional to the Laplacian on the group mani- 
fold. Taking a locally geodesic co-ordinate system (in which the connection 
vanishes) confirms this: J? = —V? on the three-sphere. The operator in 
(16.100) is not the Laplacian we want, however. What we need is the V? on 
the two-sphere S? = G/H, including the the connection. This V? operator 
differs from the one on the total space since it must contain only differential 
operators lying in the horizontal subspaces. There is a natural notion of or- 
thogonality in the Lie group, deriving from the Killing form, and it is natural 
to choose the horizontal subspaces to be orthogonal to the fibres of G/H. 
Because multiplication on the right by the subgroup generated by J3 moves 
one up and down the fibres, the orthogonal displacements are obtained by 
multiplication on the right by infinitesimal elements made by exponentiating 
J, and Jy. The desired V? is thus made out of the left-invariant vector fields 
(which act by multiplication on the right), J; and Jz only. The wave operator 
must therefore be 

HV aI ds = ade (16.101) 


Applying this to the ve a, we see that they are eigenfunctions of —V? on S? 
with eigenvalues J(J+1)—A*. The Laplace eigenvalues for our flux = 47A 
monopole problem are therefore 


Ejm=(J(J+1)-A*), J2I|Al, -Jsms< J. (16.102) 


The utility of the monopole Harmonics is not restricted to exotic monopole 
physics. They occur in molecular and nuclear physics as the wavefunctions 
for the rotational degrees of freedom of diatomic molecules and uniaxially 
deformed nuclei that possess angular momentum A about their axis of sym- 
metry.® 


Exercise 16.3: Compare these energy levels for a particle on a sphere with 
those of the Landau level problem on the plane. Show that for any fixed 
flux the low-lying energies remain close to E = (eB/mMparticle)(n + 1/2), with 
n = 0,1,..., but their degeneracy is is equal to the number of flux units 
penetrating the sphere plus one. 


3This is explained, with chararacteristic terseness, in a footnote on page 317 of Landau 
and Lifshitz Quantum Mechanics (Third Edition). 
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16.3.4 Bundle connection and curvature forms 


Recall that in section 16.3.2 we introduced the Lie-Algebra-valued functions 
A,.(a). We now use these functions to introduce the bundle connection form 
A that lives in T*P. We set 


A= A, de" (16.103) 


and 
AS 9g! (At+dgg7) g. (16.104) 


In these definitions, x and g are the local co-ordinates in which points in 
the total space are labelled as (x,g), and d acts on functions of x, and the 
“6” is used to denote the exterior derivative acting on the fibre.t We have, 
then, that dz" = 0 and dg = 0. The combinations dgg~! and g~‘dg are 
respectively the right- and left-invariant Maurer-Cartan form on the group. 

The complete exterior derivative in the total space requires us to differen- 
tiate both with respect to g and with respect to x, and is given by di, = d+0. 
Because d? , 6? and (d+ 6)? = d*? + 6? +d6 + 6d are all zero, we must have 


dd + dé =0. (16.105) 


We now define the bundle curvature form in terms of A to be 


, def 
Ly 


= dip A + A?. (16.106) 


To compute F in terms of A(x) and g we need the ingredients 


dA = gq '(dA)g, (16.107) 
and 
5A = —(g- dg)A — A(g 15g) — (97 *5g)”. (16.108) 
We find that 
F=(d+d)A+A? = gi! (dA+A’)Q 
= g Fg, (16.109) 
where ; 
F = FF dar de", (16.110) 


4It is not therefore to be confused with the Hodge 6 = d' operator. 
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and 


Fv = OwAy — Ag + (Au, Av]. (16.111) 


Although we have defined the connection form A in terms of the local 
bundle co-ordinates (x, g), it is, in fact, an intrinsic quantity, i.e., it is has a 
global existence, independent of the choice of these co-ordinates. A has been 
constructed so that 

i) A vector is annihilated by A if and only if it is horizontal. In particular, 

A(D,,) = 0 for all covariant derivatives D,,. 
ii) The connection form is constant on left-invariant vector fields on the 
fibres. In particular, A(L,) = ira. 
Between them, the globally defined fields D, € H,(P) and L, € V,(P) span 
the tangent space T’P,. Consequently the two properties listed above tell us 
how to evaluate A on any vector, and so define it uniquely and globally. 

From the globally defined and gauge invariant A and its associated cur- 
vature F, and for any local gauge-choice section 0 : (U C M) — P, we can 
recover the gauge-dependent base-space forms A and F' as the pull-backs 


A=o"*A, F=o'F, (16.112) 


to U C M of the total-space forms. The resulting forms are 
1 
A=(o'A,o +0 0,0) dt", F= 7 (o> Five) datar”, (16.113) 


and coincide with the equations connecting A, with A, and F,, with Fy, 
that we obtained in section 16.3.2. We should take care to note that the 
dx" that appear in A and F are differential forms on M, while the dx” that 
appear in A and F are differential forms on P. Now the projection 7 is a left 
inverse of the gauge-choice section a, i.e. 700 = identity. The associated 
pull-backs are also inverses, but with the order reversed: o* o 7* = identity. 
These maps relate the two sets of “dx"” by 


dz? |g ]o" (da" |p). OF “da! |p =a" (da" \ar).- (16.114) 


We now explain the advantage of knowing the total space connection and 
curvature forms. Consider the Chern character « tr F? on the base-space 
M. We can use the bundle projection 7 to pull this form back to the total 
space. From 


Lay = (go')'Fw(go~"*), (16.115) 
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we find that 


= 


nm (tr F?) =trF’. (16.116) 


Now A, F and dio, have the same calculus properties as A , F' and d. The 
manipulations that give 


9 
tr F? = dtr (aaa + 4°) 


also show, therefore, that 


2 
tr F? — dtot tr (4 dior A + =A) ‘ (16.117) 


There is a big difference in the significance of the computation, however. The 
bundle connection A is globally defined; consequently, the form 


2 
w3(A) = tr (4 dict A + =A) (16.118) 


is also globally defined. The pull-back to the total space of the Chern char- 
acter is dj, exact! This miracle works for all characteristic classes: but on 
the base-space they are exact only when the bundle is trivial; on the total 
space they are always exact. 

We have seen this phonomenon before, for example in exercise 15.7. The 
area form d[Area] = sin 6 d6d¢@ is closed but not exact on S?. When pulled 
back to S$? by the Hopf map, the area form becomes exact: 


Hopf*d[Area] = sin 6 d6d¢é = d(— cos 0d¢ + dw). (16.119) 


16.3.5 Characteristic classes as obstructions 


The generalized Gauss-Bonnet theorem states that, for a compact orientable 
even-dimensional manifold M, the integral of the Euler class over M is equal 
to the Euler character y(/). Shiing-Shen Chern used the exactness of the 
pull-back of the Euler class to give an elegant intrinsic proof? of this theorem. 
He showed that the integral of the Euler class over MV was equal to the sum 
of the Poincare-Hopf indices of any tangent vector field on M, a sum we 
independently know to equal the Euler character y(M). We illustrate his 


°S-J. Chern, Ann. Math. 47 (1946) 85-121. This paper is a readable classic. 
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strategy by showing how a non-zero cho(F’) provides a similar index sum for 
the singularities of any section of an SU(2)-bundle over a four-dimensional 
base space. This result provides an interpretation of characteristic classes as 
obstructions to the existence of global sections. 

Let 0 : M — P be a section of an SU(2) principal bundle P over a 
four-dimensional compact orientable manifold M without boundary. For 
any SU(n) group we have ch,(F’) = 0, but 


[ eulPy=-g5 ft (F?) =n, (16.120) 


can be non-zero. 

The section o will, in general, have points x; where it becomes singular. 
We punch infinitesimal holes in M surrounding the singular points. The 
manifold M’ = (M \ holes) will have as its boundary 0M’ a disjoint union 
of small three-spheres. We denote by © the image of M’ under the map 
ao: M' > P. This © will be a submanifold of P, whose boundary will be 
equal in homology to a linear combination of the boundary components of 
M’ with integer coefficients. We show that the Chern number n is equal to 
the sum of these coefficients. 

We begin by using the projection a to pull back ch(F’), to the bundle, 
where we know that 


1 
™*cho(F’) = ——=diot w3(A). (16.121) 
81? 
Now we can decompose w3(A) into terms of different bi-degree, i.e. into 
terms that are p-forms in d and q-forms in 0. 


w3(A) = w$ + wy + we +09. (16.122) 


Here the superscript counts the form-degree in 6, and the subscript the form- 
degree in d. The only term we need to know explicitly is wé. This comes 


from the g~‘dg part of A, and is 


wo = tr (9-89) 80°69) + $(a%a)°) 


fl 
ot 
a 
aN 
le 
Rea) 
eh 
lon 
Ka 
Sacceie, 
w 
+ 
Yoo 
ea) 
h 
lon 
Ka 
Noma 
w 
Say 


= —F(g"*9)* (16.123) 
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We next use the map a : M’ — P to pull the right-hand side of (16.121) 
back from P to M’. We recall that acting on forms on M’ we have o* o7* = 
identity. Thus 


[ autry= far) = [conch 


= gf or diaun(A) 

= — ya ft w3(A) 

= — sae) 

= = | w'sa. (16.124) 


At the first step we have observed that the omitted spheres make a negligeable 
contribution to the integral over M/, and at the last step we have used the 
fact that the boundary of %, has significant extent only along the fibres, 
so all contributions to the integral over 0% come from the purely vertical 
component of w3(A), which is w} = —4(g7'dg). 

We know (see exercise 15.8) that for maps g > U € SU(2) we have 


fo (g~‘dg)® = 24r? x winding number 
We conclude that 


[ eulP)= 5 | 8a) = ye N; (16.125) 


singularities x; 


where N; is the Brouwer degree of the map a : S? — SU(2) & S? on the 
small sphere surrounding 2;. 

It turns out that for any SU(n) the integral of tr(g~'dg)* is 247? times 
an integer winding number of g about homology spheres. The second Chern 
number of a SU(n)-bundle is therefore also equal to the sum of the winding- 
number indices of the section about its singularities. Chern’s strategy can 
be used to relate other characteristic classes to obstructions to the existence 
of global sections of appropriate bundles. 
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16.3.6 Stora-Zumino descent equations 


In the previous sections we met the forms 
A=g"'Agt+q7'6g9 (16.126) 


and 
A=a'Ao+oa~'do. (16.127) 


The group element g labeled points on the fibres and was independent x, 
while o(x) was the gauge-choice section of the bundle and depended on z. 
The two quantities A and A look similar, but are not identical. A third 
superficially similar but distinct object is met with in the BRST (Becchi- 
Rouet-Stora-Tyutin) approach to quantizing gauge theories, and also in the 
geometric theory of anomalies. We describe it here to alert the reader to the 
potential for confusion. 

Rather than attempting to define this new differential form rigorously, 
we will first explain how to calculate with it, and only then indicate what it 
is. We begin by considering a fixed connection form A on M, and its orbit 
under the action of the group G of gauge transformations. This elements of 
this infinite dimensional group are maps g : M — G equipped with pointwise 
product gige(x) = gi(x)go(x). This g(a) is neither the fibre co-ordinate g, 
nor the gauge choice section o(x). The gauge transformation g(x) acts on A 
to give A where 


AI = g7'Ag 4+ g7'dg. (16.128) 
We now introduce an object 
v(x) = 976g, (16.129) 
and consider 
A= AI+v=g 'Agt+g'dg+g ‘dg. (16.130) 


This 1-form appears to be a hybrid of the earlier quantities, but we will 
see that it has to be considered as something new. The essential difference 
from what has gone before is that we want v to behave like g~‘dg, in that 
dv = —v’, and yet to depend on x. In particular we want 6 to behave as 
an exterior derivative that implements an infinitesimal gauge transformation 
that takes g — g+ 0g. Thus, 


6(g-'dg) = —(g~'6g)(g~'dg) + g~'ddg 


—(g-'6g)(g- *dg) — (g-*dg)(g7*5g) + (9 *dg)(g7 8g) — g7'ddg 
= —v(g~'dg) — (g~‘dg)v — du, (16.131) 
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and hence 
dAI = —vA9 — Av — dv. (16.132) 


Previously g~'dg = 0, and so there was no “dv” in 6(gauge field). 
We can define a curvature associated with 2l 


FF die A + W?, (16.133) 
and compute 


B= +6) (AP +0) + (AP +0) 


= dA9+du+6A9 + bv + (AI)? + A9v + VAI 4 v? 
= dA’ +(A9)? 
= Bg (16.134) 


Stora calls (16.134) the Russian formula. 
Because ¥ is yet another gauge transform of F’, we have 


2 
to PS ire? (da 0) te (a + 5) + =) (16.135) 


and can decompose the right-hand side into terms that are simultaneously 
p-foms in d and q-forms in 0. 

The left hand side, tr? = tr F?, of (16.135) is independent of v. The 
right hand side of (16.135) contains w3(2l) which we expand as 


w3(A9 + v) = w9(A9) +.w93(v, A%) + wi(v, AY) + we (v). (16.136) 


As in the previous section, the superscript counts the form-degree in 6, and 
the subscript the form-degree in d. Explicit computation shows that 


w3(A%) = tr (A%d A’ + 2(A9)9), 
w5(v,A2) = tr(vdA), 
wi(v, AY) = —tr (A%v’), 
wa(v) = 40? (16.137) 


For example, 


2 2 1 
we(v) = tr € du + =) =r ( (—v?) + =) = aa (16.138) 
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With this decomposition, (16.117) falls apart into the chain of descent equa- 
tions 


trF? = dw§(A9), 
dw3(A2) = —dw5(v, A%), 
dwy(v,A9) = —dwi(v, AY), 
duz(v,A9) = —du¥é(v), 
dwa(v) = 0. (16.139) 
Let us verify, for example, the penultimate equation 6w?(v, AY) = —dw%(v). 
The left-hand side is 
—6 tr (A%v”) = —tr (—Av® — vA9v? — dv v”) = tr (dv v”), (16.140) 


the terms involving A% having cancelled via the cyclic property of the trace 
and the fact that A’ anticommutes with v. The right-hand side is 


—d (—dtrv?) = tr (dvv’) (16.141) 


as required. 

The descent equations were introduced by Raymond Stora and Bruno Zu- 
mino as a tool for obtaining and systematizing information about anomalies 
in the quantum field theory of fermions interacting with the gauge field A’. 
The w#(v, A’) are p-forms in the dx“, and before use they are integrated over 
p-cycles in M. This process is understood to produce local functionals of A% 
that remain g-forms in 6g. For example, in 2n space-time dimensions, the 
integral 


Tg76g,A9| = f why(g*6a, A") (16.142) 
M 


has the properties required for it to be a candidate for the anomalous vari- 
ation 6S[A9| of the fermion effective action due to an infinitesimal gauge 
transformation g > g + og. In particular, when OM = @), we have 


sIlg7*%9, 48\ = | Bu, (0,A8) = — f dw? _,(v, A9) = 0. (16.143) 
M M 


This is the Wess-Zumino consistency condition that 6(dS) must obey as a 
consequence of 6? = 0. 
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In addition to producing a convenient solution of the Wess-Zumino condi- 
tion, the descent equations provide a compact derivation of the gauge trans- 
formation properties of useful differential forms. We will not seek to explain 
further the physical meaning of these forms, leaving this to a quantum field 
theory course. 

The similarity between A and 2 led various authors to attempt to iden- 
tify them, and in particular to identify v(x) with the g~'dg Maurer-cartan 
form appearing in A. However the physical meaning of expressions such as 
d(g~‘dg) precludes such a simple interpretation. In evaluating du ~ d(g~‘dq) 
on a vector field €*(x)L, representing an infinitesimal gauge transformation, 
we are to first to insert the field into v ~ g ‘dg to obtain the x dependent 
Lie algebra element 7€*()A,, and only then to take the exterior derivative 
to obtain irene" dx". The result therefore involves derivatives of the com- 
ponents €*(x). The evaluation of an ordinary differential form on a vector 
field never produces derivatives of the vector components. 

To understand what the Stora-Zumino forms are, imagine that we equip a 
two dimensional fibre bundle E = M x F with base-space co-ordinate x and 
fibre co-ordinate y. Ap =1,q=1 form on E will then be F = f(z, y) dx dy 
for some function f(x,y). There is only one object dy, and there is no 
meaning to integrating F over x to leave a 1-form in dy on EF. The space 
of forms introduced by Stora and Zumino, on the other hand, would contain 
elements such as 


=f j(x, y) dx bye (16.144) 
M 


where there is a distinct dy, for each x € M. If we take, for example, 
j(x,y) = 6'(x — a). we evaluate J on the vector field Y (a, y)O, as 


IY peo, |= (ic —a)Y (x,y) dx = —Y"(a,y). (16.145) 


The conclusion is that that the 1-form form field u(2) ~ g~'dg must be 
considered as the left-invariant Maurer-Cartan form on the infinite dimen- 
sional Lie group G, rather than a Maurer-Cartan form on the finite dimen- 
sional Lie group G. The f,,w%,(v, A’) are therefore elements of the coho- 
mology group H4(A¥%) of the G orbit of A, a rather complicated object. For 
a thorough discussion see: J. A. de Azcdrraga, J. M. Izquierdo, Lie groups, 
Lie Algebras, Cohomology and some Applications in Physics, published by 
Cambridge University Press. 


Chapter 17 


Complex Analysis I 


Although this chapter is called complex analysis, we will try to develop 
the subject as complex calculus — meaning that we shall follow the calculus 
course tradition of telling you how to do things, and explaining why theorems 
are true, with arguments that would not pass for rigorous proofs in a course 
on real analysis. We try, however, to tell no lies. 

This chapter will focus on the basic ideas that need to be understood 
before we apply complex methods to evaluating integrals, analysing data, 
and solving differential equations. 


17.1 Cauchy-Riemann equations 
We focus on functions, f(z), of a single complex variable, z, where z = x+iy. 


We can think of these as being complex valued functions of two real variables, 
x and y. For example 


f(z) =sinz = sin(z + iy) sin x cos iy + cos x sin iy 


sinxcoshy+icosxsinhy. (17.1) 
Here, we have used 
sing = 


COSu = 
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to make the connection between the circular and hyperbolic functions. We 
shall often write f(z) = u-+iv, where u and v are real functions of x and y. 
In the present example, u = sinxcoshy and v = cos x sinh y. 

If all four partial derivatives 


Qu Dv Ov TN 
Ox’ Oy Ox’ Oy 
exist and are continuous then f = u-+ iv is differentiable as a complex- 
valued function of two real variables. This means that we can approximate 
the variation in f as 

Of Of 


0) apo a 
i= Oy 
where the dots represent a remainder that goes to zero faster than linearly 


as 0x, dy go to zero. We now regroup the terms, setting dz = dx + idy, 
dz = dx — idy, so that 


—dy+: (17.3) 


_ Of ies 
Of OF OE EG (17.4) 
where we have defined 
OF asp eh age 
dz 2\0u “By , 
Of _ Of | of 
ee 3(F ta (17.5) 
Now our function f(z) does not depend on Z, and so it must satisfy 
Of 
OL ae, iz 
al (17.6) 
Thus, with f = u+iv, 
fo O 
See agece iv) = 17. 
5 (Ft ig) (wri) 0 (A757) 


1.€. 


uo), (90, 0), 
Ox Oy “\ Or Oty = © 
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Since the vanishing of a complex number requires the real and imaginary 
parts to be separately zero, this implies that 


du _ dv 
Ox iC 
Ov Ou 
or — Oy! (17.9) 


These two relations between u and v are known as the Cauchy-Riemann 
equations, although they were probably discovered by Gauss. If our continu- 
ous partial derivatives satisfy the Cauchy-Riemann equations at zp = %9 +7Yo0 
then we say that the function is complex differentiable (or just differentiable) 
at that point. By taking dz = z — zg, we have 


af @ Fe) - flo) = FE - a) t-, (17.10) 


where the remainder, represented by the dots, tends to zero faster than |z—zo| 
as z — 2. This validity of this linear approximation to the variation in f(z) 
is equivalent to the statement that the ratio 


z— £9 
tends to a definite limit as z — z from any direction. It is the direction- 
independence of this limit that provides a proper meaning to the phrase 
“does not depend on Z.” Since we are not allowing dependence on 2Z, it is 


natural to drop the partial derivative signs and write the limit as an ordinary 


derivative ‘ 
ee Ga ee (17.12) 
232 2 2% dz 
We will also use Newton’s fluxion notation 
af = Pe (17-93) 
dz 


The complex derivative obeys exactly the same calculus rules as ordinary 
real derivatives: 


ea = ne 
dz a , 
d — 
ce sinz = cosz, 
d df dg 
— = — — : 17.14 
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If the function is differentiable at all points in an arcwise-connected! open 
set, or domain, D, the function is said to be analytic there. The words regular 
or holomorphic are also used. 


17.1.1 Conjugate pairs 


The functions u and v comprising the real and imaginary parts of an analytic 
function are said to form a pair of harmonic conjugate functions. Such pairs 
have many properties that are useful for solving physical problems. 

From the Cauchy-Riemann equations we deduce that 


oF oe? 


OF 0? 


and so both the real and imaginary parts of f(z) are automatically harmonic 
functions of x, y. 
Further, from the Cauchy-Riemann conditions, we deduce that 
Cede + ou = 0. (17.16) 
Ox Ox = Oy Oy 
This means that Vu- Vu = 0. We conclude that, provided that neither 
of these gradients vanishes, the pair of curves u = const. and v = const. 
intersect at right angles. If we regard u as the potential solving some 
electrostatics problem V7¢ = 0, then the curves v = const. are the associated 
field lines. 
Another application is to fluid mechanics. If v is the velocity field of an 
irrotational (curlv = 0) flow, then we can (perhaps only locally) write the 
flow field as a gradient 


Ur = Ord, 

Of. = Oe, (17.17) 
where ¢ is a velocity potential. If the flow is incompressible (divv = 0), then 
we can (locally) write it as a curl 

Ur = yx, 
Oy. = E02 (17.18) 


! Arcwise connected means that any two points in D can be joined by a continuous path 
that lies wholely within D. 
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where yx is a stream function. The curves y = const. are the flow streamlines. 
If the flow is both irrotational and incompressible, then we may use either @ 
or x to represent the flow, and, since the two representations must agree, we 
have 


O,~ = +0yx, 
Bids a0 ay. (17.19) 


Thus ¢ and y are harmonic conjugates, and so the complex combination 
® = $+ 2x is an analytic function called the complex stream function. 

A conjugate v exists (at least locally) for any harmonic function u. To 
see why, assume first that we have a (u,v) pair obeying the Cauchy-Riemann 
equations. Then we can write 


du = —dzx+—dy 
a y 
= —Z de + 5 dy. (17.20) 


This observation suggests that if we are given a harmonic function u in some 
simply connected domain D, we can define a v by setting 


UZ) = a (~Fac + andy) + u(Zo), (17.21) 


for some real constant v(zo) and point zp. The integral does not depend on 
choice of path from zp to z, and so vu(z) is well defined. The path indepen- 
dence comes about because the curl 


0 Ou O FOUN. a6 


vanishes, and because in a simply connected domain all paths connecting the 
same endpoints are homologous. 

We now verify that this candidate u(z) satisfies the Cauchy-Riemann 
realtions. The path independence, allows us to make our final approach to 
z=x2+7y along a straight line segment lying on either the x or y axis. If we 
approach along the x axis, we have 


Ou 


u(z) = / (-$) dz’ + rest of integral, (17.23) 
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and may use 


d a 
=f flew de’ = flew) (17.24) 
dx 
to see that 3 3 
v UL 
— = -— E25 
Ox Oy ( ) 
t (x,y). If, instead, we approach along the y axis, we may similarly compute 
Ov Ou 
—=—. 17.26 
Oy Ox ( ) 


Thus v(z) does indeed obey the Cauchy-Riemann equations. 

Because of the utility the harmonic conjugate it is worth giving a practical 
recipe for finding it, and so obtaining f(z) when given only its real part 
u(x,y). The method we give below is one we learned from John d’Angelo. 
It is more efficient than those given in most textbooks. We first observe that 
if f is a function of z only, then f(z) depends only on Z. We can therefore 


define a function f of Z by setting f(z) = f(z). Now 


5 (#2) + F@) = ule, 9). (17.27) 


oy 
Set i 
r=-(z2+2Z), y= a ~ (17.28) 
1 1 1 
u (56+2.50-9) 5 (f(z) + f(@)). (17.29) 
Now set Z = 0, while keeping z fixed! Thus 
f(2) + F(0) =2u G, ae (17.30) 


The function f is not completely determined of course, because we can always 
add a constant to v, and so we have the result 


fle ) = 2u (E, =) +40, CER. (17.31) 
For example, let u = x? — y?. We find 
hee z\2_ 
f(z) + F() =2(F) -2(=) =2, (17.32) 
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or 


fij=24+iC, CER. (17.33) 


The business of setting setting 7 = 0, while keeping z fixed, may feel like 
a dirty trick, but it can be justified by the (as yet to be proved) fact that f 
has a convergent expansion as a power series in z = x+7y. In this expansion 
it is meaningful to let x and y themselves be complex, and so allow z and 
Zz to become two independent complex variables. Anyway, you can always 
check ex post facto that your answer is correct. 


17.1.2 Conformal mapping 


An analytic function w = f(z) maps subsets of its domain of definition in 
the “z” plane on to subsets in the “w” plane. These maps are often useful 
for solving problems in two dimensional electrostatics or fluid flow. Their 
simplest property is geometrical: such maps are conformal. 


[2 


Figure 17.1: An illustration of conformal mapping. The unshaded “triangle” 
marked z is mapped into the other five unshaded regions by the functions 
labeling them. Observe that although the regions are distorted, the angles of 
the “triangle” are preserved by the maps (with the exception of those corners 
that get mapped to infinity). 


Suppose that the derivative of f(z) at a point zp is non-zero. Then, for z 
near zo we have 


f(z) — f(zo) © A(z — 20), (17.34) 
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where 
A= ff 


ee 17. 
Bl (17.35) 


If you think about the geometric interpretation of complex multiplication 
(multiply the magnitudes, add the arguments) you will see that the “f” 
image of a small neighbourhood of zo is stretched by a factor |A|, and rotated 
through an angle arg A — but relative angles are not altered. The map z > 
f(z) = w is therefore isogonal. Our map also preserves orientation (the sense 
of rotation of the relative angle) and these two properties, isogonality and 
orientation-preservation, are what make the map conformal? The conformal 
property fails at points where the derivative vanishes or becomes infinite. 

If we can find a conformal map z (= x + iy) + w (= u+iv) of some 
domain D to another D’ then a function f(z) that solves a potential theory 
problem (a Dirichlet boundary-value problem, for example) in D will lead to 
f(z(w)) solving an analogous problem in D’. 

Consider, for example, the map z+ w = z+ e’%. This map takes the 
strip —oo < 4 < oo, —m < y < 7 to the entire complex plane with cuts from 
—oo + im to —1+ 77 and from —oo — ia to —1 — im. The cuts occur because 
the images of the lines y = +z get folded back on themselves at w = —1l+iz, 
where the derivative of w(z) vanishes. (See figure 17.2) 

In this case, the imaginary part of the function f(z) = x + ty trivially 
solves the Dirichlet problem Vil = 0 in the infinite strip, with y = 7 
on the upper boundary and y = —7z on the lower boundary. The function 
y(u,v), now quite non-trivially, solves V7, y = 0 in the entire w plane, with 
y = 7 on the half-line running from —oo + tz to —1 + 77, and y = —7 on the 
half-line running from —oo — iz to —1— im. We may regard the images of 
the lines y = const. (solid curves) as being the streamlines of an irrotational 
and incompressible flow out of the end of a tube into an infinite region, or as 
the equipotentials near the edge of a pair of capacitor plates. In the latter 
case, the images of the lines x = const. (dotted curves) are the corresponding 
field-lines 
Example: The Joukowski map. This map is famous in the history of aero- 
nautics because it can be used to map the exterior of a circle to the exterior 
of an aerofoil-shaped region. We can use the Milne-Thomson circle theorem 
(see 17.3.2) to find the streamlines for the flow past a circle in the z plane, 


If f were a function of Z only, then the map would still be isogonal, but would reverse 
the orientation. We call such maps antiholomorphic or anti-conformal. 
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—oo < x < oc under 


OU, SS. aT 


Figure 17.2: Image of part of the strip —T 


the map ze w=z+e’. 
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and then use Joukowski’s transformation, 


w= f(z)= ; (: 4. ~) (17.36) 


to map this simple flow to the flow past the aerofoil. To produce an aerofoil 
shape, the circle must go through the point z = 1, where the derivative of f 
vanishes, and the image of this point becomes the sharp trailing edge of the 
aerofoil. 


The Riemann mapping theorem 


There are tables of conformal maps for D, D’ pairs, but an underlying prin- 
ciple is provided by the Riemann mapping theorem: 

Theorem: The interior of any simply connected domain D in C whose bound- 
ary consists of more that one point can be mapped conformally one-to-one 
and onto the interior of the unit circle. It is possible to choose an arbitrary 
interior point wo of D and map it to the origin, and to take an arbitrary 
direction through wo and make it the direction of the real axis. With these 
two choices the mapping is unique. 


Ww [Z 


Figure 17.3: The Riemann mapping theorem. 


This theorem was first stated in Riemann’s PhD thesis in 1851. He re- 
garded it as “obvious” for the reason that we will give as a physical “proof.” 
Riemann’s argument is not rigorous, however, and it was not until 1912 that 
a real proof was obtained by Constantin Carathéodory. A proof that is both 
shorter and more in spirit of Riemann’s ideas was given by Leopold Fejér 
and Frigyes Riesz in 1922. 
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For the physical “proof,” observe that in the function 
Phe =e tinea (17.37) 
5, Me= — 5" z| +76}, . 


the real part ¢ = —> In |z| is the potential of a unit charge at the origin, 
and with the additive constant chosen so that ¢ = 0 on the circle |z| = 1. 
Now imagine that we have solved the two-dimensional electrostatics problem 
of finding the potential for a unit charge located at wo € D, also with the 
boundary of D being held at zero potential. We have 


V’¢1 = —5°(w— wo), ¢1=0 on OD. (17.38) 
Now find the ¢2 that is harmonically conjugate to ¢,. Set 
er eae ne 5 In(ze#*) (17.39) 
where a is a real constant. We see that the transformation w +> z, or 
fae ere). (17.40) 


does the job of mapping the interior of D into the interior of the unit circle, 
and the boundary of D to the boundary of the unit circle. Note how our 
freedom to choose the constant a is what allows us to “take an arbitrary 
direction through wo and make it the direction of the real axis.” 

Example: To find the map that takes the upper half-plane into the unit 
circle, with the point z = 7 mapping to the origin, we use the method of 
images to solve for the complex potential of a unit charge at w = 12: 


od, +ibdg = FoF (In(w — 7) — In(w +7)) 


Therefore 
pme -. (17.41) 
w+ 
We immediately verify that that this works: we have |z| = 1 when w is real, 
and z= 0 at w =1. 
The difficulty with the physical argument is that it is not clear that a so- 
lution to the point-charge electrostatics problem exists. In three dimensions, 
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for example, there is no solution when the boundary has a sharp inward 
directed spike. (We cannot physically realize such a situation either: the 
electric field becomes unboundedly large near the tip of a spike, and bound- 
ary charge will leak off and neutralize the point charge.) There might well 
be analogous difficulties in two dimensions if the boundary of D is patho- 
logical. However, the fact that there 7s a proof of the Riemann mapping 
theorem shows that the two-dimensional electrostatics problem does always 
have a solution, at least in the interior of D — even if the boundary is an 
infinite-length fractal. However, unless OD is reasonably smooth the result- 
ing Riemann map cannot be continuously extended to the boundary. When 
the boundary of D is a smooth closed curve, then the the boundary of D 
will map one-to-one and continuously onto the boundary of the unit circle. 


Exercise 17.1: Van der Pauw’s Theorem.? This problem explains a practical 
method of for determining the conductivity o of a material, given a sample in 
the form of of a wafer of uniform thickness d, but of irregular shape. In practice 
at the Phillips company in Eindhoven, this was a wafer of semiconductor cut 
from an unmachined boule. 


Figure 17.4: A thin semiconductor wafer with attached leads. 


We attach leads to point contacts A, B,C, D, taken in anticlockwise order, on 
the periphery of the wafer and drive a current J4g from A to B. We record the 
potential difference Vp — Vo and so find Ragp,nc = (Vp — Vo)/Lap. Similarly 
we measure Rgc,ap. The current flow in the wafer is assumed to be two 
dimensional, and to obey 


J = —(od)VV, V-J=0, 
3L. J. Van der Pauw, Phillips Research Reps. 13 (1958) 1. See also A. M. Thompson, 


D. G. Lampard, Nature 177 (1956) 888, and D. G. Lampard. Proc. Inst. Elec. Eng. C. 
104 (1957) 271, for the “Calculable Capacitor.” 
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and n- J = 0 at the boundary (except at the current source and drain). The 
potential V is therefore harmonic, with Neumann boundary conditions. 


Van der Pauw claims that 


exp{—todRap.pc} + exp{—todRgc,ap} = 1. 


From this od can be found numerically. 


a) First show that Van der Pauw’s claim is true if the wafer were the entire 
upper half-plane with A, B,C, D on the real axis with 74 < @Bp<ao< 
Lp. 

b) Next, taking care to consider the transformation of the current source 
terms and the Neumann boundary conditions, show that the claim is 
invariant under conformal maps, and, by mapping the wafer to the upper 
half-plane, show that it is true in general. 


17.2 Complex integration: Cauchy and Stokes 


In this section we will define the integral of an analytic function, and make 
contact with the exterior calculus from chapters 11-13. The most obvious 
difference between the real and complex integral is that in evaluating the 
definite integral of a function in the complex plane we must specify the path 
along which we integrate. When this path of integration is the boundary of 
a region, it is often called a contour from the use of the word in the graphic 
arts to describe the outline of something. The integrals themselves are then 
called contour integrals. 


17.2.1 The complex integral 
The complex integral 


i flz)dz (17.42) 


over a path I may be defined by expanding out the real and imaginary parts 


[ fea: 2 [Curie ae-+idy) = [(ude—vay) +i f (vde-+udy). (17.43) 
r r P iy 

and treating the two integrals on the right hand side as standard vector- 
calculus line-integrals of the form f{ v-dr, one with v > (u, —v) and and one 
with v > (v,u). 
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= Zz 
a=Z 1 _ 
0 ly =b a b 


Figure 17.5: A chain approximation to the curve T. 


The complex integral can also be constructed as the limit of a Riemann sum 
in a manner parallel to the definition of the real-variable Riemann integral 
of elementary calculus. Replace the path I with a chain composed of of N 
line-segments Zo-to-2z1, 21-to-z2, all the way to zy_,-to-zy. Now let &, lie 
on the line segment joining z,,-; and z,,. Then the integral = f(z)dz is the 
limit of the (Riemann) sum 


S = S 5 f(Em)(%m — 2m-1) (17.44) 


m=1 


as N gets large and all the |z — Zm_i| — 0. For this definition to make 
sense and be useful, the limit must be independent of both how we chop up 
the curve and how we select the points €,,.. This will be the case when the 
integration path is smooth and the function being integrated is continuous. 

The Riemann-sum definition of the integral leads to a useful inequality: 
combining the triangle inequality |a + b| < |a| + |b] with |ab| = |a| |b] we 
deduce that 


> F(Em)(2m — %m—1)| S S- lf (Em) (2m — 2m—1)| 


Me 


f (Em)| (2m — Zm-1)|. —-(17.45) 


For sufficiently smooth curves the last sum converges to the real integral 
Jr \f(z)| |dz|, and we deduce that 


| [toa 


< i(e)I del, (17.46) 
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For curves I that are smooth enough to have a well-defined length ||, we 
will have f,, |dz| = |I|. From this identification we conclude that if |f| <M 
on I’, then we have the Darboux inequality 


[te dz 


We shall find many uses for this inequality. 
The Riemann sum definition also makes it clear that if f(z) is the deriva- 
tive of another analytic function g(z), #.e. 


< MIT. (17.47) 


(17.48) 


then, for [ a smooth path from z = a to z = b, we have 
[ faz = 910) - (a) (17.49) 
r 


This claim is established by approximating f(&m) © (g(2m) —9(2m-1))/(Zém— 
2m—1), and observing that the resulting Riemann sum 


% (9(2m) as g(m-1)) (17.50) 


m=1 


telescopes. The approximation to the derivative will become accurate in the 
limit |Z —Zm_i] ~ 0. Thus, when f(z) is the derivative of another function, 
the integral is independent of the route that [ takes from a to b. 

We shall see that any analytic function is (at least locally) the derivative 
of another analytic function, and so this path independence holds generally 
— provided that we do not try to move the integration contour over a place 
where f ceases to be differentiable. This is the essence of what is known as 
Cauchy’s Theorem — although, as with much of complex analysis, the result 
was known to Gauss. 


17.2.2 Cauchy’s theorem 


Before we state and prove Cauchy’s theorem, we must introduce an orien- 
tation convention and some traditional notation. Recall that a p-chain is a 
finite formal sum of p-dimensional oriented surfaces or curves, and that a 
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p-cycle is a p-chain I whose boundary vanishes: OF = 0. A 1-cycle that con- 
sists of only a single connected component is a closed curve. We will mostly 
consider integrals over simple closed curves — these being curves that do not 
self intersect — or 1-cycles consisting of finite formal sums of such curves. 
The orientation of a simple closed curve can be described by the sense, clock- 
wise or anticlockwise, in which we traverse it. We will adopt the convention 
that a positively oriented curve is one such that the integration is performed 
in a anticlockwise direction. The integral over a chain I of oriented simple 
closed curves will be denoted by the symbol tp j dz. 

We now establish Cauchy’s theorem by relating it to our previous work 
with exterior derivatives: Suppose that f is analytic with domain D, so that 
Ozf = 0 within D. We therefore have that the exterior derivative of f is 


df = 0.f dz + Of dz = ,f dz. (17.51) 


Now suppose that the simple closed curve I’ is the boundary of a region 
(2c D. We can exploit Stokes’ theorem to deduce that 


a flz)dz = [aseoae = [oon dz \dz =0. (17.52) 


The last integral is zero because dz \ dz = 0. We may state our result as: 
Theorem (Cauchy, in modern language): The integral of an analytic function 
over a 1-cycle that is homologous to zero vanishes. 

The zero result is only guaranteed if the function f is analytic throughout 
the region Q. For example, if I is the unit circle z = e then 


20 20 
¢ (=) dz = | Cae \= if dO = 2ni. (17.53) 
r\é 0 0 


Cauchy’s theorem is not applicable because 1/z is singular, i.e. not differen- 
tiable, at z = 0. The formula (17.53) will hold for T any contour homologous 
to the unit circle in C \ 0, the complex plane punctured by the removal of 


the point z = 0. Thus 
¢ (=) dz = 2mi (17.54) 
rT \é 


for any contour I that encloses the origin. We can deduce a rather remarkable 
formula from (17.54): Writing P = 0Q with anticlockwise orientation, we use 
Stokes’ theorem to obtain 


1 ee ee 2 ats: “Os 
i (<) a= [a (<) dz \ dz = 6 0¢2. (17.55) 
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Since dz A dz = 2idx A dy, we have established that 


1 
Oz (<) = 10(x)d(y). (17.56) 
z 
This rather cryptic formula encodes one of the most useful results in math- 
ematics. 

Perhaps perversely, functions that are more singular than 1/z have van- 
ishing integrals about their singularities. With I again the unit circle, we 


have 
1 27 ; , 27 ; 
¢ (=) dz = | 6 Vd (et) if ae" dd =; (17.57) 
r \é 0 0 


The same is true for all higher integer powers: 


1 
¢ (=) dz=0; m2. (17.58) 
r \4 


We can understand this vanishing in another way, by evaluating the in- 


tegral as 
1 1 
| | =U." “age 


1 d 1 1 
$(S)e= $5 (—aa)e- (ee ae 
(17.59) 


Here, the notation [A], means the difference in the value of A at two ends 
of the integration path T. For a closed curve the difference is zero because 
the two ends are at the same point. This approach reinforces the fact that 
the complex integral can be computed from the “anti-derivative” in the same 
way as the real-variable integral. We also see why 1/z is special. It is the 
derivative of Inz = In|z| + iargz, and Inz is not really a function, as it is 
multivalued. In evaluating [In z], we must follow the continuous evolution 
of arg z as we traverse the contour. As the origin is within the contour, this 
angle increases by 277, and so 


[In 2], = [targ 2], =i (arge?™ — arge™) = 2ni. (17.60) 


Exercise 17.2: Suppose f(z) is analytic in a simply-connected domain D, and 
zo € D. Set g(z) = Ue f(z) dz along some path in D from zo to z. Use the 
path-independence of the integral to compute the derivative of g(z) and show 
that 


d 
fa=F. 
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This confirms our earlier claim that any analytic function is the derivative of 
some other analytic function. 


Exercise 17.3:The “D-bar” problem: Suppose we are given a simply-connected 
domain 2, and a function f(z,Z) defined on it, and wish to find a function 
F(z,Z) such that 
OF (z,Z) 
OZ 
Use (17.56) to argue formally that the general solution is 


= fz); (32) ea: 


Fo. =—2 [ en dy +06), 


where g(C) is an arbitrary analytic function. This result can be shown to be 
correct by more rigorous reasoning. 


17.2.3. The residue theorem 


The essential tool for computations with complex integrals is provided by 
the residue theorem. With the aid of this theorem, the evaluation of contour 
integrals becomes easy. All one has to do is identify points at which the 
function being integrated blows up, and examine just how it blows up. 

If, near the point z;, the function can be written 


alt (i) (i) 


2 an germans es oe ce Wz 
fe) foe! a ras \, ); ey 


(z — z%) 


where g(z) is analytic and non-zero at z;, then f(z) has a pole of order N at 
z;. If N =1 then f(z) is said to have a simple pole at z;. We can normalize 
g®(z) so that g(z;) = 1, and then the coefficient, af, of 1/(z — x) is 
called the residue of the pole at z;. The coefficients of the more singular 
terms do not influence the result of the integral, but N must be finite for the 
singularity to be called a pole. 

Theorem: Let the function f(z) be analytic within and on the boundary 
I = OD of asimply connected domain D, with the exception of finite number 
of points at which f(z) has poles. Then 


$f) Zs y 2ni (residue at pole), (17.62) 
is 
poles € D 
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the integral being traversed in the positive (anticlockwise) sense. 
We prove the residue theorem by drawing small circles C; about each 
singular point z; in D. 


Figure 17.6: Circles for the residue theorem. 
We now assert that 


f One » ¢ Se) Ae (17.63) 


because the 1-cycle 


C=T-S (CG, =00 (17.64) 


is the boundary of a region 2 in which f is analytic, and hence C' is homol- 
ogous to zero. If we make the radius R; of the circle C; sufficiently small, we 
may replace each g(z) by its limit g(z;) = 1, and so take 


(17.65) 


on C;. We then evaluate the integral over C; by using our previous results 
to get 


¢ j(2)a2= Qmria\”. (17.66) 
C; 
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The integral around I is therefore equal to 277); al’), 

The restriction to contours containing only finitely many poles arises for 
two reasons: Firstly, with infinitely many poles, the sum over 7 might not 
converge; secondly, there may be a point whose every neighbourhood contains 
infinitely many of the poles, and there our construction of drawing circles 
around each individual pole would not be possible. 


Exercise 17.4: Poisson’s Formula. The function f(z) is analytic in |z| < R’. 
Prove that if ja] << R< R’, 


1 he aa 
=e —— dz. 
f(a) Ori boicr (z = a)(R2 _ az) f(z) z 
Deduce that, for0<r< R, 
: 1 20 R?2 _ p2 
fre) = ; 


a a id 
On 9 R*%—2Rrcos(@— ¢) +r? f(Re’®) dé. 


Show that this formula solves the boundary-value problem for Laplace’s equa- 
tion in the disc |z| < R. 


Exercise 17.5: Bergman Kernel. The Hilbert space of analytic functions on a 
domain D with inner product 


a) = f Fa dedy 


is called the Bergman? space of D. 


a) Suppose that y,(z), n = 0,1,2,..., are a complete set of orthonormal 
functions on the Bergman space. Show that 


K(6,2) = Yo ¢m()Pml2)- 
m=0 
has the property that 
o(6) = ff K(G.2)a(2) dedy. 


4This space should not be confused with Bargmann-Fock space which is the space 
analytic functions on the entirety of C with inner product 


(f,9) = [ el fg dz. 


Stefan Bergman and Valentine Bargmann are two different people. 
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for any function g analytic in D. Thus K(¢, z) plays the role of the delta 
function on the space of analytic functions on D. This object is called 
the reproducing or Bergman kernel. By taking g(z) = Yn(z), show that 
it is the unique integral kernel with the reproducing property. 

b) Consider the case of D being the unit circle. Use the Gramm-Schmidt 
procedure to construct an orthonormal set from the functions 2", n = 


0,1,2,.... Use the result of part a) to conjecture (because we have not 
proved that the set is complete) that, for the unit circle, 
1 1 
K(¢, z) = ————_.. 
I= 50 p 


c) For any smooth, complex valued, function g defined on a domain D and 
its boundary, use Stokes’ theorem to show that 


1 
ty Ozg(z,Z)dxdy = af g(z,Z)dz. 
D 2 Jap 


Use this to verify that this the K(¢,z) you constructed in part b) is 
indeed a (and hence “the” ) reproducing kernel. 

d) Now suppose that D is a simply connected domain whose boundary 0D 
is a smooth curve. We know from the Riemann mapping theorem that 
there exists an analytic function f(z) = f(z;¢) that maps D onto the 
interior of the unit circle in such a way that f(¢) = 0 and f’(¢) is real 
and non-zero. Show that if we set K(¢, z) = f’(z)f’(¢)/7, then, by using 
part c) together with the residue theorem to evaluate the integral over 
the boundary, we have 


a(6) = ff K(G,2)a(2) dedy, 


This A (¢, z) must therefore be the reproducing kernel. We see that if we 
know K we can recover the map f from 


reo= V Rico 89): 


e) Apply the formula from part d) to the unit circle, and so deduce that 


=f 2h 
7 ie 


f(z) 


is the unique function that maps the unit circle onto itself with the point 
¢ mapping to the origin and with the horizontal direction through ¢ 
remaining horizontal. 
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17.3 Applications 
We now know enough about complex variables to work through some inter- 


esting applications, including the mechanism by which an aeroplane flies. 


17.3.1 Two-dimensional vector calculus 


It is often convenient to use complex co-ordinates for vectors and tensors. In 
these co-ordinates the standard metric on R? becomes 


“ds?” = dx@dx+dy® dy 
dz @ dz 
= G2,dz ® dz + gz.dz ® dz + g.zdz ® dz + gaedz ® dz,(17.67) 


so the complex co-ordinate components of the metric tensor are g,, 


9:z = Jzz = 3. The inverse metric tensor is 9” = 9” = 2, g* =g* =0. 
In these co-ordinates the Laplacian is 
V? = gd? = 2(0.0: + 0-0.). (17.68) 


When f has singularities, it is not safe to assume that 0,0zf = 0-0, f. For 
example, from 


1 
ae (<) = 765*(0,y), (17.69) 
we deduce that 
0,0, nz = 167(a, y). (17.70) 
When we evaluate the derivatives in the opposite order, however, we have 
0,0zln z = 0. (17.71) 


To understand the source of the non-commutativity, take real and imaginary 
parts of these last two equations. Write In z = In|z| + 70, where 0 = arg z, 
and add and subtract. We find 


V7 In|z| = 276?(z,y), 
(0,0, — OyO,)9 = 276? (x,y). (17.72) 


The first of these shows that = In|z| is the Green function for the Laplace 
operator, and the second reveals that the vector field V@ is singular, having 
a delta function “curl” at the origin. 
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If we have a vector field v with contravariant components (v”, v’) and (nu- 
merically equal) covariant components (v,, vy) then the covariant components 
in the complex co-ordinate system are vz = $(v, — ivy) and vz = $(vz + ivy). 
This can be obtained by a using the change of co-ordinates rule, but a quicker 
route is to observe that 


v- dr =v,dx + vy dy = v,dz + ved. (17.73) 


Now ; 
Oz = F(OxVe + Oydy) + 17 (Oyve — Oxy): (17.74) 


Thus the statement that Ozv, = 0 is equivalent to the vector field v being 
both solenoidal (incompressible) and irrotational. This can also be expressed 
in form language by setting 7 = v, dz and saying that dy = 0 means that the 
corresponding vector field is both solenoidal and irrotational. 


17.3.2. Milne-Thomson circle theorem 


As we mentioned earlier, we can describe an irrotational and incompressible 
fluid motion either by a velocity potential 


Ue = 0,0; y= 0,0; Ct £5) 


where v is automatically irrotational but incompressibilty requires V7¢ = 0, 
or by a stream function 


Ui = OG) Up] OE, (17.76) 


where v is automatically incompressible but irrotationality requires V7y = 0. 
We can combine these into a single complex stream function ® = @ + ix 
which, for an irrotational incompressible flow, satisfies the Cauchy-Riemann 
equations and is therefore an analytic function of z. We see that 


_ de 


ee iv aree 
(17.77) 


2v, 
@ and y making equal contributions. 


The Milne-Thomson theorem says that if ® is the complex stream func- 
tion for a flow in unobstructed space, then 


®=O(z) + (5) (17.78) 
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is the stream function after the cylindrical obstacle |z| = a is inserted into 
the flow. Here ®(z) denotes the analytic function defined by ®(z) = ®(2). 
To see that this works, observe that a?/z = Z on the curve |z| = a, and so on 
this curve Im ® = xy = 0. The surface of the cylinder has therefore become 
a streamline, and so the flow does not penetrate into the cylinder. If the 
original flow is created by souces and sinks exterior to |z| = a, which will be 
singularities of ®, the additional term has singularites that lie only within 
|z| = a. These will be the “images” of the sources and sinks in the sense of 
the “method of images.” 

Example: A uniform flow with speed U in the x direction has ®(z) = Uz. 
Inserting a cylinder makes this 


®(z) =U (: + “) ; (17.79) 


Because v, is the derivative of this, we see that the perturbing effect of the 
obstacle on the velocity field falls off as the square of the distance from the 
cylinder. This is a general result for obstructed flows. 


2 


Figure 17.7: The real and imaginary parts of the function z+.z~! provide the 
velocity potentials and streamlines for irrotational incompressible flow past 
a cylinder of unit radius. 


17.3.3. Blasius and Kutta-Joukowski theorems 


We now derive the celebrated result, discovered independently by Martin 
Wilhelm Kutta (1902) and Nikolai Egorovich Joukowski (1906), that the 
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lift per unit span of an aircraft wing is equal to the product of the density 
of the air p, the circulation k = $v-dr about the wing, and the forward 
velocity U of the wing through the air. Their theory treats the air as being 
incompressible—a good approximation unless the flow-velocities approach 
the speed of sound—and assumes that the wing is long enough that the flow 
can be regarded as being two dimensional. 


# ; : <a 
Figure 17.8: Flow past an aerofoil. 
Begin by recalling how the momentum flux tensor 
fir, = PUjU; + Gyl (17.80) 


enters fluid mechanics. In cartesian co-ordinates, and in the presence of an 
external body force f; acting on the fluid, the Euler equation of motion for 
the fluid is 


Here P is the pressure and we are distinguishing between co and contravariant 
components, although at the moment gj; = 06;;. We can combine Euler’s 
equation with the law of mass conservation, 


d.p + O'(pu;) = 0, (17.82) 


to obtain . 
O;(pv;) + O (pus; + gP) = fi (17.83) 


This momemtum-tracking equation shows that the external force acts as a 
source of momentum, and that for steady flow f; is equal to the divergence 
of the momentum flux tensor: 


jf 2071, =9" Ole (17.84) 
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As we are interested in steady, irrotational motion with uniform density we 
may use Bernoulli’s theorem, P + $p|v|? = const., to substitute —4p|v|? in 
place of P. (The constant will not affect the momentum flux.) With this 
substitution Tj; becomes a traceless symmetric tensor: 


1 
Ti = plvivj — 5 ailel (17.85) 


Using v, = $(v,z — ivy) and 


Tg = ——— — T, 17.86 
Oz 02: * ( ) 
together with 
ee a ee: w 
=u = -(z24+2), y =x" = —(z-72) (17.87) 
Z 2% 
we find 1 
(i eee z (Pex — Tyy — 22T ry) = p(vz)?. (17.88) 


This is the only component of T;,; that we will need to consider. Tz is simply 


T’,, whereas Tz = 0 = Tz, because T;; is traceless. 
In our complex co-ordinates, the equation 


fi= 9 OT: (17.89) 


reads 
fe = GQ? OzT xz + GO.T zz = 20,7. (17.90) 


We see that in steady flow the net momentum flux P; out of a region 2 is 
given by 


1 1 1 
Pa} f,dzdy = =| {dade = - | OsT dzdz = -¢ Toe (1791) 
Q 21 Q 1 Q a aq 


We have used Stokes’ theorem at the last step. In regions where there is no 
external force, T is analytic, O;[ = 0, and the integral will be independent 
of the choice of contour 0Q. We can subsititute T = pv? to get 


P,= ~ip ve dz. (17.92) 
0) 
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To apply this result to our aerofoil we take can take OC to be its boundary. 
Then P, is the total force exerted on the fluid by the wing, and, by Newton’s 
third law, this is minus the force exerted by the fluid on the wing. The total 
force on the aerofoil is therefore 


F, = io ve dz. (17.93) 
OQ. 


The result (17.93) is often called Blasius’ theorem. 

Evaluating the integral in (17.93) is not immediately possible because the 
velocity v on the boundary will be a complicated function of the shape of 
the body. We can, however, exploit the contour independence of the integral 
and evaluate it over a path encircling the aerofoil at large distance where the 
flow field takes the asymptotic form 


1 
op=U, ae +0 (=) (17.94) 


The O(1/z) term is the velocity perturbation due to the air having to flow 
round the wing, as with the cylinder in a free flow. To confirm that this flow 
has the correct circulation we compute 


by -dr = f vd: + f vee =a (17.95) 


Substituting v, in (17.93) we find that the O(1/z?) term cannot contribute as 
it cannot affect the residue of any pole. The only part that does contribute 
is the cross term that arises from multiplying U, by «/(4ziz). This gives 


Fo =p (=) “ = i1pKUy (17.96) 
so that i i 
5 (Fa =) = ips (Us = ty): (17.97) 
Thus, in conventional co-ordinates, the reaction force on the body is 
F, = pkuy, 
Fy. = =pau,. (17.98) 


The fluid therefore provides a lift force proportional to the product of the 
circulation with the asymptotic velocity. The force is at right angles to the 
incident airstream, so there is no drag. 
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The circulation around the wing is determined by the Kutta condition 
that the velocity of the flow at the sharp trailing edge of the wing be finite. 
If the wing starts moving into the air and the requisite circulation is not 
yet established then the flow under the wing does not leave the trailing edge 
smoothly but tries to whip round to the topside. The velocity gradients 
become very large and viscous forces become important and prevent the air 
from making the sharp turn. Instead, a starting vortex is shed from the 
trailing edge. Kelvin’s theorem on the conservation of vorticity shows that 
this causes a circulation of equal and opposite strength to be induced about 
the wing. 

For finite wings, the path independence of ¢ v- dr means that the wings 
must leave a pair of trailing wingtip vortices of strength « that connect back 
to the starting vortex to form a closed loop. The velocity field induced by the 
trailing vortices cause the airstream incident on the aerofoil to come from a 
slighly different direction than the asymptotic flow. Consequently, the lift is 
not quite perpendicular to the motion of the wing. For finite-length wings, 
therefore, lift comes at the expense of an inevitable induced drag force. The 
work that has to be done against this drag force in driving the wing forwards 
provides the kinetic energy in the trailing vortices. 


17.4 Applications of Cauchy’s theorem 


Cauchy’s theorem provides the Royal Road to complex analysis. It is possible 
to develop the theory without it, but the path is harder going. 


17.4.1 Cauchy’s integral formula 


If f(z) is analytic within and on the boundary of a simply connected domain 
Q, with OQ =T, and if ¢ is a point in 2, then, noting that the the integrand 
has a simple pole at z = ¢ and applying the residue formula, we have Cauchy’s 
integral formula 


Oe ee 


— Q. 17.99 
Qri eae o= ( ) 
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Figure 17.9: Cauchy contour. 


This formula holds only if ¢ lies within 2. If it lies outside, then the integrand 
is analytic everywhere inside 2, and so the integral gives zero. 

We may show that it is legitimate to differentiate under the integral sign 
in Cauchy’s formula. If we do so n times, we have the useful corollary that 


| 
OAa _ fle) d 17.100 
FOC) = Teg A de (17.100) 
This shows that being once differentiable (analytic) in a region automatically 
implies that f(z) is differentiable arbitrarily many times! 


Exercise 17.6: The generalized Cauchy formula. Suppose that we have solved a 
D-bar problem (see exercise 17.3), and so found an F(z, Z) with 0zF = f(z,Z) 
in a region 2. Compute the exterior derivative of 


F(z,2) 
z—¢ 


using (17.56). Now, manipulating formally with delta functions, apply Stokes’ 
theorem to show that, for (¢,¢) in the interior of 2, we have 


P\ 1 F(z,Z) 1 f(z%) 


27t Jag 2%- 


This is called the generalized Cauchy formula. Note that the first term on the 
right, unlike the second, is a function only of ¢, and so is analytic. 


Liouville’s theorem 


A dramatic corollary of Cauchy’s integral formula is provided by 
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Liouville’s theorem: If f(z) is analytic in all of C, and is bounded there, 
meaning that there is a positive real number K such that |f(z)| < K, then 
f(z) is a constant. 

This result provides a powerful strategy for proving that two formule, 
fi(z) and fo(z), represent the same analytic function. If we can show that 
the difference f, — fo is analytic and tends to zero at infinity then Liouville’s 
theorem tells us that f; = fo. 

Because the result is perhaps unintuitive, and because the methods are 
typical, we will spell out in detail how Liouville’s theorem works. We select 
any two points, z; and z2, and use Cauchy’s formula to write 


fa) flel=s5 6 ( eras ) AG) ae, (17.101) 


201 Z—-2% 2-24 


We take the contour [ to be circle of radius p centered on z;. We make 
p > 2|z1 — Z2|, so that when z is on I we are sure that |z — z9| > p/2. 


Figure 17.10: Contour for Liouville’ theorem. 


Then, using | f f(z)dz| < f |f(<)||dz|, we have 


fe) fell = =] $ AS? pea: 


Qn 
7 |z1 — 29|K 0 = 2\21 — 2o|K 
2m Jo p/2 p 

The right hand side can be made arbitrarily small by taking p large enough, 
so we we must have f(z1) = f(z2). As 2 and z2 were any pair of points, we 
deduce that f(z) takes the same value everywhere. 


IA 


. (17.102) 
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Exercise 17.7: Let a,,...,ay be N distinct complex numbers. Use Liouville’s 
theorem to prove that 


oe (z — aj) (z — ax)? “ryt, Sola 


— ay 
k#j j=1 k#j j= 1 (ee — 45) 


17.4.2 Taylor and Laurent series 


We have defined a function to be analytic in a domain D if it is (once) 
complex differentiable at all points in D. It turned out that this apparently 
mild requirement automatically implied that the function is differentiable 
arbitrarily many times in D. In this section we shall see that knowledge 
of all derivatives of f(z) at any single point in D is enough to completely 
determine the function at any other point in D. Compare this with functions 
of a real variable, for which it is easy to construct examples that are once 
but not twice differentiable, and where complete knowledge of function at a 
point, or in even in a neighbourhood of a point, tells us absolutely nothing 
of the behaviour of the function away from the point or neighbourhood. 

The key ingredient in these almost magical properties of complex ana- 
lytic functions is that any analytic function has a Taylor series expansion 
that actually converges to the function. Indeed an alternative definition of 
analyticity is that f(z) be representable by a convergent power series. For 
real variables this is the definition of a real analytic function. 

To appreciate the utility of power series representations we do need to 
discuss some basic properties of power series. Most of these results are ex- 
tensions to the complex plane of what we hope are familiar notions from real 
analysis. 

Consider the power series 


iy An(Z — 2)" = sim SN; (17.103) 
n=0 


where Sy are the partial sums 
N 
Sn = S An(Z — 2%)”. (17.104) 


Suppose that this limit exists (i.e the series is convergent) for some z = ¢; 
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then it turns out that the series is absolutely convergent? for any |z — 29| < 
IC — 20]. 

To establish this absolute convergence we may assume, without loss of 
generality, that zo = 0. Then, convergence of the sum 5) a,¢” requires that 
|an¢"| — 0, and thus |a,,¢"| is bounded. In other words, there is a B such 
that |a,¢”"| < B for any n. We now write 


n 


<B 


n 
z 


¢ ¢ 


The sum )> |a,2"| therefore converges for |z/¢| < 1, by comparison with a 
geometric progression. 

This result, that if a power series in (z — zo) converges at a point then 
it converges at all points closer to zg, shows that a power series possesses 
some radius of convergence R. The series converges for all |z — z9| < R, and 
diverges for all |z — z9| > R. What happens on the circle |z — z| = R is 
usually delicate, and harder to establish. A useful result, however, is Abel’s 
theorem, which we will not try to prove. Abel’s theorem says that if the sum 


>> a, is convergent, and if A(z) = 7°, an2” for |z| < 1, then 


|anz”| = |an¢”| (17.105) 


Jim A(z) = d, Gn. (17.106) 
The converse is not true: if A(z) has a finite limit as we approach the circle 
of convergence, the corresponding sum need not converge 

By comparison with a geometric progression, we may establish the fol- 
lowing useful formulee giving R for the series S> a,z”: 


Rie, dis el 


= lim |a,|'/”. (17.107) 


The proof of these formule is identical the real-variable version. 


Recall that absolute convergence of }>a, means that >> |a,,| converges. Absolute 
convergence implies convergence, and also allows us to rearrange the order of terms in the 
series without changing the value of the sum. Compare this with conditional convergence, 
where >a, converges, but )>|a,| does not. You may remember that Riemann showed 
that the terms of a conditionally convergent series can be rearranged so as to get any 
answer whatsoever! 
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We soon show that the radius of convergence of a power series is the 
distance from zp to the nearest singularity of the function that it represents. 
When we differentiate the terms in a power series, and thus take a,z” — 
nanz” |, this does not alter R. This observation suggests that it is legitimate 
to evaluate the derivative of the function represented by the powers series by 
differentiating term-by-term. As step on the way to justifying this, observe 
that if the series converges at z = ¢ and D, is the domain |z| < r < |¢| then, 
using the same bound as in the proof of absolute convergence, we have 
scot aagit 
iy *| < Br <8o M, 
where 5) M,, is convergent. As a consequence So a,z" is uniformly con- 
vergent in D, by the Weierstrass “M” test. You probably know that uni- 
form convergence allows the interchange the order of sums and integrals: 
SOO frlv))dx = Xf fr(x)dx. For real variables, uniform convergence is 
not a strong enough a condition for us to to safely interchange order of sums 
and derivatives: (> fn(x))! is not necessarily equal to )> f/ (x). For complex 
analytic functions, however, Cauchy’s integral formula reduces the operation 
of differentiation to that of integration, and so this interchange is permitted. 
In particular we have that if 


(17.108) 


n 


fs are (17.109) 
n=0 


and R is defined by R = |¢| for any ¢ for which the series converges, then 
f(z) is analytic in |z| < R and 


fe= S- idee (17.110) 
n=0 


is also analytic in |z| < R. 


Morera’s theorem 


There is is a partial converse of Cauchy’s theorem: 

Theorem (Morera): If f(z) is defined and continuous in a domain D, and 
if ¢, f(z)dz = 0 for all closed contours, then f(z) is analytic in D. To 
prove this we set F(z) = [ . f(¢) d¢. The integral is path-independent by the 
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hypothesis of the theorem, and because f(z) is continuous we can differentiate 
with respect to the integration limit to find that F’(z) = f(z). Thus F(z) 
is complex differentiable, and so analytic. Then, by Cauchy’s formula for 
higher derivatives, F(z) = f’(z) exists, and so f(z) itself is analytic. 

A corollary of Morera’s theorem is that if f,(z) — f(z) uniformly in D, 
with all the f,, analytic, then 

i) f(z) is analytic in D, and 

ii) f(z) — f'(z) uniformly. 

We use Morera’s theorem to prove (i) (appealing to the uniform conver- 
gence to justify the interchange the order of summation and integration), 
and use Cauchy’s theorem to prove (ii). 


Taylor’s theorem for analytic functions 


Theorem: Let Tbe a circle of radius p centered on the point a. Suppose that 
f(z) is analytic within and on I, and and that the point z = ¢ is within Tl. 
Then f(¢) can be expanded as a Taylor series 


£0) = Fa) + Lp(a, (17.111) 


meaning that this series converges to f(¢) for all ¢ such that |¢ — a] < p. 
To prove this theorem we use identity 


ne eee (ae) (=e caer 
Fae =e Gea gaa Coase (17.112) 
and Cauchy’s integral, to write 
25. al f(z) 
fo) = Qri ora aa 
N-1 
_ (C=a)* f(z) C=" f(z) 
7 > Qri foam+ Qn cerniceraia 
yo (6 = 9)" 
= of (a) + Ry, (17.113) 
— 
where 
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This is Taylor’s theorem with remainder. For real variables this is as far as 
we can go. Even if a real function is differentiable infinitely many times, 
there is no reason for the remainder to become small. For analytic functions, 
however, we can show that Ry — 0 as N — oo. This means that the 
complex-variable Taylor series is convergent, and its limit is actually equal 
to f(z). To show that Ry — 0, recall that T is a circle of radius p centered 
on z =a. Let r= |¢ —a| < p, and let M be an upper bound for f(z) on I. 
(This exists because f is continuous and [ is a compact subset of C.) Then, 
estimating the integral using methods similar to those invoked in our proof 
of Liouville’s Theorem, we find that 
N 
ee (a7) (17.115) 
2m \ pN(p—r) 
As r < p, this tends to zero as N — oo. 
We can take p as large as we like provided there are no singularities of 
f end up within, or on, the circle. This confirms the claim made earlier: 
the radius of convergence of the powers series representation of an analytic 
functionis the distance to the nearest singularity. 


Laurent series 


Theorem (Laurent): Let [, and [2 be two anticlockwise circlular paths with 
centre a, radii p, and pz, and with pg < p,. If f(z) is analytic on the circles 
and within the annulus between them, then, for ¢ in the annulus: 


F(0) = So an(¢- a)" + So bn(C - a). (17.116) 


Figure 17.11: Contours for Laurent’s theorem. 
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The coefficients a, and b, are given by 


l f(z) 1 -1 
n : a 
ame ear Go yn ; dz, bn = ee (z)(z — a) dz (17.117) 


Laurent’s theorem is proved by observing that 


spill HOD ec AON 
N= Fah G-O mS, G-9 te 


and using the identities 


1 1 (¢ —a) (¢-a)¥-! (¢ yN 4 
z-C€ z-a (z — a)? (z—a)N EGE (17.119) 
and 
ce pee ciel (z—a)"" | @=a)™ 1 (17.120) 


p20 Ca (Cape aa eae Cee 


Once again we can show that the remainder terms tend to zero. 

Warning: Although the coefficients a, are given by the same integrals as in 
Taylor’s theorem, they are not interpretable as derivatives of f unless f(z) 
is analytic within the inner circle, in which case all the b,, are zero. 


17.4.3 Zeros and singularities 


This section is something of a nosology — a classification of diseases — but 
you should study it carefully as there is some tight reasoning here, and the 
conclusions are the essential foundations for the rest of subject. 
First a review and some definitions: 
a) If f(z) is analytic with a domain D, we have seen that f may be 
expanded in a Taylor series about any point z) € D: 


ie) = S- An(z — Zo)”. (17.121) 
n=0 
If dg = ay = +++ = Gn_; = O, and a, ¥ 0, so that the first non-zero 


term in the series is an(z— Zo)", we say that f(z) has a zero of order n 
at Zo. 
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b) A singularity of f(z) is a point at which f(z) ceases to be differentiable. 
If f(z) has no singularities at finite z (for example, f(z) = sin z) then 
it is said to be an entire function. 

c) If f(z) is analytic in D except at z = a, an isolated singularity, then 
we may draw two concentric circles of centre a, both within D, and in 
the annulus between them we have the Laurent expansion 


f(z) = > an(z—- a)" + > bn(2 - a). (17.122) 


The second term, consisting of negative powers, is called the principal 
part of f(z) at z =a. It may happen that b,, 4 0 but b, = 0, > m. 
Such a singularity is called a pole of order m at z = a. The coefficient 
b,, which may be 0, is called the residue of f at the pole z = a. If the 
series of negative powers does not terminate, the singularity is called 
an isolated essential singularity 

Now some observations: 

i) Suppose f(z) is analytic in a domain D containing the point z = a. 
Then we can expand: f(z) = }jan(z—a)". If f(z) is zero at z = 0, 
then there are exactly two possibilities: a) all the a, vanish, and then 
f(z) is identically zero; b) there is a first non-zero coefficient, a, say, 
and so f(z) = 2™y(z), where y(a) 4 0. In the second case f is said to 
possess a zero of order m at z= a. 

ii) If z = a is a zero of order m, of f(z) then the zero is isolated — i.e. 
there is a neighbourhood of a which contains no other zero. To see this 
observe that f(z) = (z—a)™y(z) where y(z) is analytic and y(a) £ 0. 
Analyticity implies continuity, and by continuity there is a neighbour- 
hood of a in which y(z) does not vanish. 

iii) Limit points of zeros I: Suppose that we know that f(z) is analytic in D 
and we know that it vanishes at a sequence of points aj, da2,a3,... € D. 
If these points have a limit point® that is interior to D then f(z) must, 
by continuity, be zero there. But this would be a non-isolated zero, in 
contradiction to item ii), unless f(z) actually vanishes identically in D. 
This, then, is the only option. 

iv) From the definition of poles, they too are isolated. 

SA point z is a limit point of a set S if for every « > 0 there is some a € S, other than 


zo itself, such that |a — zo| < «. A sequence need not have a limit for it to possess one or 
more limit points. 
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v) If f(z) has a pole at z =a then f(z) — co as z — a in any manner. 

vi) Limit points of zeros II: Suppose we know that f is analytic in D, 
except possibly at z = a which is limit point of zeros as in iii), but we 
also know that f is not identically zero. Then z = a must be singularity 
of f — but not a pole ( because f would tend to infinity and could 
not have arbitrarily close zeros) — so a must be an isolated essential 
singularity. For example sin 1/z has an isolated essential singularity at 
z = 0, this being a limit point of the zeros at z = 1/nz. 

vii) A limit point of poles or other singularities would be a non-isolated 
essential singularity. 


17.4.4 Analytic continuation 


Suppose that f(z) is analytic in the (open, arcwise-connected) domain Dj, 
and fo(z) is analytic in Dy, with D, 1 D2 4 0. Suppose further that f(z) = 
fo(z) in Di NM Dp. Then we say that fo is an analytic continuation of f; to 
Dy. Such analytic continuations are unique: if f3 is also analytic in D2, and 
fs = fi in Di N Do, then fo — fg = 0 in Dy MN Dg. Because the intersection 
of two open sets is also open, f; — fo vanishes on an open set and, so by 
observation iii) of the previous section, it vanishes everywhere in Dy. 


Figure 17.12: Intersecting domains. 


We can use this uniqueness result, coupled with the circular domains of 
convergence of the Taylor series, to extend the definition of analytic functions 
beyond the domain of their initial definition. 
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The distribution x9"! 


An interesting and useful example of analytic continuation is provided by the 
distribution naar which, for real positive a, is defined by its evaluation on 
a test function y(z) as 


Cola a x°—ly(2) dex. (17.123) 


The pairing («{',y) extends to an complex analytic function of a provided 
the integral converges. Test functions are required to decrease at infinity 
faster than any power of x, and so the integral always converges at the upper 
limit. It will converge at the lower limit provided Re(a) > 0. Assume that 
this is so, and integrate by parts using 


< (=s10)) = 2% p(x) + —¢/(2). (17.124) 


We find that, for « > 0, 
xr = i a-l we xe / 
—y(r)| = xo (a) dx + —y'(x) dx. (17.125) 
€ € Qa 


The integrated-out part on the left-hand-side of (17.125) tends to zero as 
we take € to zero, and both of the integrals converge in this limit as well. 
Consequently 


Ii(a)= --f eo (b) de (17.126) 

0 
is equal to (x?"',) for 0 < Re(a) < oo. However, the integral defining 
I,(a@) converges in the larger region —1 < Re(qa) < oo. It therefore provides 


an analytic continuation to this larger domain. The factor of 1/a reveals that 
the analytically-continued function possesses a pole at a = 0, with residue 


— i y (x) dx = ¢(0). (17127) 


We can repeat the integration by parts, and find that 


I,(a) af tho" (x) dx (17.128) 
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provides an analytic continuation to the region —2 < Re(a) < co. By 
proceeding in this manner, we can continue (x$7',y) to a function analytic 
in the entire complex a plane with the exception of zero and the negative 
integers, at which it has simple poles. The residue of the pole at a = —n is 
"1(0)/n!. 

There is another, much more revealing, way of expressing these analytic 
continuations. To obtain this, suppose that ¢ € C'™|0,oo] and @ — 0 at 
infinity as least as fast as 1/x. (Our test function y decreases much more 
rapidly than this, but 1/z is all we need for what follows.) Now the function 


I(a) = | * a! d(x) dx (17.129) 


0 


is convergent and analytic in the strip 0 < Re(a) < 1. By the same reasoning 
as above, (qa) is there equal to 


[oe eo 
= | = Ade (17.130) 
0 
Again this new integral provides an analytic continuation to the larger strip 


—1 < Re(a) < 1. But in the left-hand half of this strip, where —1 < 
Re(a) < 0, we can write 


_ i ~ (0) dr = lim { i © 28-160) dar - Fon] ‘} 


= lim ve ae taa) dae: oS} 


(17.131) 


I| 
ae 
8 
8 
Q 
th 
Boa 
ay 
| 
Be 
S 
Q 
8 


Observe how the integrated out part, which tends to zero in 0 < Re(a) < 1, 
becomes divergent in the strip —1 < Re(a) < 0. This divergence is there 
craftily combined with the integral to cancel its divergence, leaving a finite 
remainder. As a consequence, for —1 < Re(a) < 0, the analytic continuation 
is given by 


ia= [ a*'[b(x) — $(0)| de. (17.132) 
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Next we observe that y(x) = [é(x) — ¢(0)]/x tends to zero as 1/x for 
large x, and at x = 0 can be defined by its limit as x(0) = ¢’(0). This y(z) 
then satisfies the same hypotheses as ¢(x). With I(a@) denoting the analytic 
continuation of the original J, we therefore have 


I(a) = a x*'[¢(x) — o(0)|dz, = —-1< Re(a) <0 
= [ o-1 — Be ages aie 


- / et 
a, 


+ — 6'() dx, —1< Re(G) <0 
the arrow denoting the same analytic continuation process that we used with 


A i 
QD. 


fore) er 
* x l6(z) — (0) —2¢'(O)] dx, —2 <Re(a) <—1, 
(17.133) 


We can now apply this machinary to our original y(x), and so deduce 
that the analytically-continued distribution is given by 


a v(x) da, 0 < Re(a) < ~, 
(22-1, y) = / * 2 NHo(x) — y(0)] dr, -1<Re(a) <0, 


i z*"[p(x) — y(0) — xy’(0)| dx, —2 < Re(a) < -1, 

’ (17.134) 
and so on. The analytic continuation automatically subtracts more and more 
terms of the Taylor series of y(x) the deeper we penetrate into the left-hand 
half-plane. This property, that analytic continuation covertly subtracts the 
minimal number of Taylor-series terms required ensure convergence, lies be- 
hind a number of physics applications, most notably the method of dimen- 
sional regularization in quantum field theory. 

The following exercise illustrates some standard techniques of reasoning 
via analytic continuation. 


Exercise 17.8: Define the dilogarithm function by the series 


2 3 
F z z Zz 
Liol)= sp toatgat 


12 
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The radius of convergence of this series is unity, but the domain of Lig(z) can 
be extended to |z| > 1 by analytic continuation. 


a) Observe that the series converges at z = +1, and at z = 1 is 
is. a n 
DC eee 
ig(1) + 2 + 3 G 


Rearrange the series to show that 


12 


12 

b) Identify the derivative of the power series for Lig(z) with that of an 
elementary function. Exploit your identification to extend the definition 
of [Lig(z)]’ outside |z| < 1. Use the properties of this derivative function, 
together with part a), to prove that the extended function obeys 


Lil) = 


1 nr 


ee Lis ( ~) =F (nz? =, 


This formula allows us to calculate values of the dilogarithm for |z| > 1 
in terms of those with |z| < 1. 


Many weird identities involving dilogarithms exist. Some, such as 


1 1 1 1 1 1 
his == 4 Sl |=) == nn SS (= Ss)? 

i ( 5) + gti (5) ig” thn n 5 (In 2) 3 (in3)", 
were found by Ramanujan. Others, originally discovered by sophisticated 
numerical methods, have been given proofs based on techniques from quantum 
mechanics. Polylogarithms, defined by 


: z z z 
Lig(2) = se + oe + 56 


occur frequently when evaluating Feynman diagrams. 


free, 


17.4.5 Removable singularities and the Weierstrass- 
Casorati theorem 


Sometimes we are given a definition that makes a function analytic in a 
region with the exception of a single point. Can we extend the definition to 
make the function analytic in the entire region? Provided that the function 
is well enough behaved near the point, the answer is yes, and the extension 
is unique. Curiously, the proof that this is so gives us insight into the wild 
behaviour of functions near essential singularities. 
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Removable singularities 


Suppose that f(z) is analytic in D\a, but that lim,_.,(z—a) f(z) = 0, then f 
may be extended to a function analytic in all of D— i.e. z = aisa removable 
singularity. 'To see this, let ¢ lie between two simple closed contours [; and 
I>, with a within the smaller, 2. We use Cauchy to write 


Le (ee Co eee me ae ACO (17.135) 


ori Ip, 2—¢ OT eG 


1S 


Now we can shrink I'y down to be very close to a, and because of the condition 
on f(z) near z = a, we see that the second integral vanishes. We can also 
arrange for I; to enclose any chosen point in D. Thus, if we set 


fs: ot f(z) 
tS ease Nee 


dz (17.136) 


within [’,, we see that fi = f in D\a, and is analytic in all of D. The extension 
is unique because any two analytic functions that agree everywhere except 
for a single point, must also agree at that point. 


Weierstrass-Casorati 


We apply the idea of removable singularities to show just how pathological 
a beast is an isolated essential singularity: 

Theorem (Weierstrass-Casorati): Let z = a be an isolated essential singular- 
ity of f(z), then in any neighbourhood of a the function f(z) comes arbitrarily 
close to any assigned valued in C. 

To prove this, define N3(a) = {z € C: |z-—a] < 6}, and N.(¢) = {z € 
C: |z—¢| < e}. The claim is then that there is an z € N5(a) such that 
f(z) € N-(¢). Suppose that the claim is not true. Then we have | f(z)—¢| > € 
for all z € N5(a). Therefore 


are “|<? (17.137) 


in N5(a), while 1/(f(z) — ¢) is analytic in Ns(a) \ a. Therefore z = a is a 
removable singularity of 1/(f(z) —¢), and there is an an analytic g(z) which 
coincides with 1/(f(z) — ¢) at all points except a. Therefore 


(17.138) 
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except at a. Now g(z), being analytic, may have a zero at z = a giving a 
pole in f, but it cannot give rise to an essential singularity. The claim is 
true, therefore. 


Picard’s theorems 


Weierstrass-Casorati is elementary. There are much stronger results: 
Theorem (Picard’s little theorem): Every nonconstant entire function attains 
every complex value with at most one exception. 

Theorem (Picard’s big theorem): In any neighbourhood of an isolated essen- 
tial singularity, f(z) takes every complex value with at most one exception. 
The proofs of these theorems are hard. 

As an illustration of Picard’s little theorem, observe that the function 
exp z is entire, and takes all values except 0. For the big theorem observe 
that function f(z) = exp(1/z). has an essential singularity at z = 0, and 
takes all values, with the exception of 0, in any neighbourhood of z = 0. 


17.5 Meromorphic functions and the winding- 
number 


A function whose only singularities in D are poles is said to be meromor- 
phic there. These functions have a number of properties that are essentially 
topological in character. 


17.5.1 Principle of the argument 
If f(z) is meromorphic in D with OD =T, and f(z) 40 on T, then 


1 ff ,en_ 
ahaa N-P (17.139) 


where N is the number of zero’s in D and P is the number of poles. To show 
this, we note that if f(z) = (z — a)™y(z) where yg is analytic and non-zero 
near a, then 


PAE) a, TE, ORE) (17.140) 
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so f’/f has a simple pole at a with residue m. Here m can be either positive 
or negative. The term y’(z)/y(z) is analytic at z = a, so collecting all the 
residues from each zero or pole gives the result. 

Since f’/f = a In f the integral may be written 


P) a = AcIn At ct 
ane dz = Arln f(z) = iAparg f(z), (17.141) 


the symbol Ap denoting the total change in the quantity after we traverse I. 
Thus l 
N-P= anor arg f(z). (17.142) 
1 


This result is known as the principle of the argument. 


Local mapping theorem 


Suppose the function w = f(z) maps a region 2 holomorphicly onto a region 
Q’, and a simple closed curve 7 C 2 onto another closed curve TC 2’, which 
will in general have self intersections. Given a point a € (’, we can ask 
ourselves how many points within the simple closed curve y map to a. The 
answer is given by the winding number of the image curve I about a. 


Figure 17.13: An analytic map is one-to-one where the winding number is 
unity, but two-to-one at points where the image curve winds twice. 


To that this is so, we appeal to the principal of the argument as 


# of zeros of (f — a) within y = aa orien 
1 ; dw 


Qri Jp w—a’ 


= wa), (17.143) 
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where n(I, a) is called the winding number of the image curve [ about a. It 
is equal to 

n(T,a) = A, arg (w — a), (17.144) 
and is the number of times the image point w encircles a as z traverses the 
original curve ¥. 

Since the number of pre-image points cannot be negative, these winding 
numbers must be positive. This means that the holomorphic image of curve 
winding in the anticlockwise direction is also a curve winding anticlockwise. 

For mathematicians, another important consequence of this result is that 
a holomorphic map is open- t.e. the holomorphic image of an open set is 
itself an open set. The local mapping theorem is therefore sometime called 
the open mapping theorem. 


17.5.2 Rouché’s theorem 


Here we provide an effective tool for locating zeros of functions. 

Theorem (Rouché): Let f(z) and g(z) be analytic within and on a simple 
closed contour y. Suppose further that |g(z)| < |f(z)| everywhere on y, then 
f(z) and f(z) + g(z) have the same number of zeros within y. 

Before giving the proof, we illustrate Rouché’s theorem by giving its most 
important corollary: the algebraic completeness of the complex numbers, a 
result otherwise known as the fundamental theorem of algebra. This asserts 
that, if R is sufficiently large, a polynomial P(z) = anz"+an_12" 1 +++++a0 
has exactly n zeros, when counted with their multiplicity, lying within the 
circle |z| = R. To prove this note that we can take R sufficiently big that 


[ge || = ae Ae? 
> lena + las a? seep |ao| 
> |Gn_gz™ | + an_oz™ ?---+ aol, (17.145) 
on the circle |z| = R. We can therefore take f(z) = a,z” and g(z) = 


An—a2” | + An_22”* +--+ a9 in Rouché. Since a,z” has exactly n zeros, all 


lying at z = 0, within |z| = R, we conclude that so does P(z). 
The proof of Rouché is a corollary of the principle of the argument. We 
observe that 


# of zerosoff+g = n(T,0) 
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= =A, arg(f +9) 

= sAjIn(f +9) 

= 5 oAyIn f+ 5 AyIn(1 + 9/f) 

= A, arg f + 4, arg (1 + g/f). (17.146) 


Now |g/f| < 1 on 7, so 1+ g/f cannot circle the origin as we traverse y. 
As a consequence A, arg (1 + g/f) = 0. Thus the number of zeros of f + 9 
inside y is the same as that of f alone. (Naturally, they are not usually in 
the same places. ) 

The geometric part of this argument is often illustrated by a dog on a 
lead. If the lead has length L, and the dog’s owner stays a distance R > L 
away from a lamp post, then the dog cannot run round the lamp post unless 
the owner does the same. 


f+g 


Nf 


Figure 17.14: The curve [ is the image of y under the map f +g. If|g| < |f], 
then, as z traverses y, f +-g winds about the origin the same number of times 
that f does. 


Exercise 17.9: Jacobi Theta Function. The function 6(z|T) is defined for 
Imt > 0 by the sum 


oe) 


0(z|r) = » eintn? e2rinz | 
n=—oo 
Show that 0(z+1|r) = 0(z|r), and 0(z+7|T) = e~'™7-2"76(z|T). Use this infor- 
mation and the principle of the argument to show that @(z|r) has exactly one 
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zero in each unit cell of the Bravais lattice comprising the points z = m + nr; 
m,n € Z. Show that these zeros are located at z = (m+ 1/2) + (n+ 1/2)r. 


Exercise 17.10: Use Rouché’s theorem to find the number of roots of the 
equation z° + 15z + 1 = 0 lying within the circles, i) |z| = 2, ii) |z| = 3/2. 


17.6 Analytic functions and topology 


17.6.1 The point at infinity 


Some functions, f(z) = 1/z for example, tend to a fixed limit (here 0) as z 
become large, independently of in which direction we set off towards infinity. 
Others, such as f(z) = exp z, behave quite differently depending on what 
direction we take as |z| becomes large. 

To accommodate the former type of function, and to be able to legiti- 
mately write f(oo) = 0 for f(z) = 1/z, it is convenient to add “oo” to the 
set of complex numbers. Technically, we are constructing the one-point com- 
pactification of the locally compact space C. We often portray this extended 
complex plane as a sphere S? (the Riemann sphere), using stereographic 
projection to locate infinity at the north pole, and 0 at the south pole. 


N 


dla 


Figure 17.15: Stereographic mapping of the complex plane to the 2-Sphere. 


By the phrase a open neighbourhood of z, we mean an open set containing 
z. We use the stereographic map to define an open neighbourhood of infinity 
as the stereographic image of a open neighbourhood of the north pole. With 
this definition, the extended complex plane C U {co} becomes topologically 
a sphere, and in particular, becomes a compact set. 
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If we wish to study the behaviour of a function “at infinity,” we use the 
map z+ ¢ = 1/z to bring co to the origin, and study the behaviour of the 
function there. Thus the polynomial 


f@SarFaze+ says” (17.147) 


becomes 


f(Q) =an tag +++ +ane-%, (17.148) 


and so has a pole of order N at infinity. Similarly, the function f(z) = z~? has 
a zero of order three at infinity, and sin z has an isolated essential singularity 
there. 

We must be a careful about defining residues at infinity. The residue is 
more a property of the 1-form f(z) dz than of the function f(z) alone, and 
to find the residue we need to transform the dz as well as f(z). For example, 
if we set z = 1/¢ in dz/z we have 


dz 1\ a 
ria Cd (=) — (17.149) 


so the 1-form (1/z) dz has a pole at z = 0 with residue 1, and has a pole 
with residue —1 at infinity—even though the function 1/z has no pole there. 
This 1-form viewpoint is required for compatability with the residue theorem: 
The integral of 1/z around the positively oriented unit circle is simultane- 
ously minus the integral of 1/z about the oppositely oriented unit circle, now 
regarded as a a positively oriented circle enclosing the point at infinity. Thus 
if f(z) has of pole of order N at infinity, and 


{ey = es O98 * a7? * ag hee aoe? Ane 
= $496? +016 +9 +4407! + ag? +--+ + AnO™™ 
(17.150) 


near infinity, then the residue at infinity must be defined to be —a_;, and 
not a, as one might naively have thought. 

Once we have allowed oo as a point in the set we map from, it is only 
natural to add it to the set we map to — in other words to allow oo as a 
possible value for f(z). We will set f(a) = oo, if | f(z)| becomes unboundedly 
large as z — a in any manner. Thus, if f(z) = 1/z we have f(0) = oo. 


The map 
Z—X% Z1 — 2 
= | —— —_——. 17.151 
~ (==) (===) ( ) 
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takes 
20 = 0, 
BA. = ==> 1, 
Zoo OO, (17.152) 


for example. Using this language, the Mobius maps 


az+b6 
= 17.153 
. cz +d ( ) 


become one-to-one maps of S? — $?. They are the only such globally con- 
formal one-to-one maps. When the matrix 


f 1) (17.154) 


is an element of SU(2), the resulting one-to-one map is a rigid rotation of 
the Riemann sphere. Stereographic projection is thus revealed to be the 
geometric origin of the spinor representations of the rotation group. 

If an analytic function f(z) has no essential singularities anywhere on 
the Riemann sphere then f is rational, meaning that it can be written as 
f(z) = P(z)/Q(<) for some polynomials P, Q. 

We begin the proof of this fact by observing that f(z) can have only a 
finite number of poles. If, to the contrary, f had an infinite number of poles 
then the compactness of S? would ensure that the poles would have a limit 
point somewhere. This would be a non-isolated singularity of f, and hence 
an essential singularity. Now suppose we have poles at 21, 22, ..., zn with 
principal parts 


Mn 


Ye 
m=1 (z > Zn)" 
If one of the z, is 00, we first use a Mobius map to move it to some finite 


point. Then 
N mn 


)-S Ge Cas (17.155) 


n=1 m=1 


is everywhere analytic, and therefore continuous, on S?. But S? being com- 
pact and F(z) being continuous implies that F' is bounded. Therefore, by 
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Liouville’s theorem, it is a constant. Thus 


Se 


+C 17.156 
ee (z — Zn)™ ; ( ) 


and this is a rational function. If we made use of a Mobius map to move 
a pole at infinity, we use the inverse map to restore the original variables. 
This manoeuvre does not affect the claimed result because Mobius maps take 
rational functions to rational functions. 

The map z+ f(z) given by the rational function 


P(z)  an2™ + An_12" 1 +--+ a9 
ee ae lst 


wraps the Riemann sphere n times around the target S?. In other words, it 
is a n-to-one map. 


17.6.2. Logarithms and branch cuts 


The function y = In z is defined to be the solution to z = expy. Unfortu- 
nately, since exp 277 = 1, the solution is not unique: if y is a solution, so is 
y + 2ni. Another way of looking at this is that if z = pexpi6, with p real, 
then y = Inp +70, and the angle @ has the same 277 ambiguity. Now there 
is no such thing as a “many valued function.” By definition, a function is a 
machine into which we plug something and get a unique output. To make 
In z into a legitimate function we must select a unique 6 = arg z for each z. 
This can be achieved by cutting the z plane along a curve extending from 
the the branch point at z = 0 all the way to infinity. Exactly where we put 
this branch cut is not important; what 7s important is that it serve as an 
impenetrable fence preventing us from following the continuous evolution of 
the function along a path that winds around the origin. 

Similar branch cuts serve to make fractional powers single valued. We 
define the power z° for for non-integral a by setting 


2° = exp fain) = |z|\“e'’, (17.158) 
where z = |z|e”. For the square root z'/? we get 


1 


z zle*”, (17.159) 
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where ,/|z| represents the positive square root of |z|. We can therefore make 
this single-valued by a cut from 0 to oo. To make ,/(z—a)(z— 6) single 
valued we only need to cut from a to b. (Why? — think this through!). 


We can get away without cuts if we imagine the functions being maps from 
some set other than the complex plane. The new set is called a Riemann 
surface. It consists of a number of copies of the complex plane, one for each 
possible value of our “multivalued function.” The map from this new surface 
is then single-valued, because each possible value of the function is the value 
of the function evaluated at a point on a different copy. The copies of the 
complex plane are called sheets, and are connected to each other in a manner 
dictated by the function. The cut plane may now be thought of as a drawing 
of one level of the multilayered Riemann surface. Think of an architect’s floor 
plan of a spiral-floored multi-story car park: If the architect starts drawing 
at one parking spot and works her way round the central core, at some point 
she will find that the floor has become the ceiling of the part already drawn. 
The rest of the structure will therefore have to be plotted on the plan of the 
next floor up — but exactly where she draws the division between one floor 
and the one above is rather arbitrary. The spiral car-park is a good model 
for the Riemann surface of the In z function. See figure 17.16. 


O 


Figure 17.16: Part of the Riemann surface for Inz. Each time we circle the 
origin, we go up one level. 


To see what happens for a square root, follow z!/? along a curve circling the 
branch point singularity at z = 0. We come back to our starting point with 
the function having changed sign; A second trip along the same path would 
bring us back to the original value. The square root thus has only two sheets, 
and they are cross-connected as shown in figure 17.17. 
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O 


Figure 17.17: Part of the Riemann surface for \/z. Two copies of C are cross- 
connected. Circling the origin once takes you to the lower level. A second 
circuit brings you back to the upper level. 


In figures 17.16 and 17.17, we have shown the cross-connections being 
made rather abruptly along the cuts. This is not necessary —there is no 
singularity in the function at the cut — but it is often a convenient way 
to think about the structure of the surface. For example, the surface for 
\/(z — a)(z — b) also consists of two sheets. If we include the point at infinity, 
this surface can be thought of as two spheres, one inside the other, and cross 
connected along the cut from a to b. 


17.6.3. Topology of Riemann surfaces 


Riemann surfaces often have interesting topology. Indeed much of modern 
algebraic topology emerged from the need to develop tools to understand 
multiply-connected Riemann surfaces. As we have seen, the complex num- 
bers, with the point at infinity included, have the topology of a sphere. The 
\/(z — a)(z — b) surface is still topologically a sphere. To see this imagine 
continuously deforming the Riemann sphere by pinching it at the equator 
down to a narrow waist. Now squeeze the front and back of the waist to- 
gether and (imagining that the the surface can pass freely through itself) fold 
the upper half of the sphere inside the lower. The result is the precisely the 
two-sheeted ,/(z — a)(z — b) surface described above. The Riemann surface 
of the function ,/(z — a)(z — b)(z — c)(z — d), which can be thought of a two 
spheres, one inside the other and connected along two cuts, one from a to 
b and one from c to d, is, however, a torus. Think of the torus as a bicycle 
inner tube. Imagine using the fingers of your left hand to pinch the front and 
back of the tube together and the fingers of your right hand to do the same 
on the diametrically opposite part of the tube. Now fold the tube about the 
pinch lines through itself so that one half of the tube is inside the other, 
and connected to the outer half through two square-root cross-connects. If 
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Figure 17.18: The I-cycles a and @ on the plane with two square-root branch 
cuts. The dashed part of a lies hidden on the second sheet of the Riemann 


surface. 


a 


Figure 17.19: The 1-cycles a and (3 on the torus. 


you have difficulty visualizing this process, figures 17.18 and 17.19 show how 
the two 1-cycles, a and 3, that generate the homology group H;(T?) appear 
when drawn on the plane cut from a to b and ¢ to d, and then when drawn 
on the torus. Observe, in figure 17.18, how the curves in the two-sheeted 
plane manage to intersect in only one point, just as they do when drawn on 
the torus in figure 17.19. 

That the topology of the twice-cut plane is that of a torus has important 
consequences. This is because the elliptic integral 


ee i dt 
=f f=9G = DG ou a ene) 


maps the twice-cut z-plane 1-to-1 onto the torus, the latter being considered 
as the complex w-plane with the points w and w+nw + mw identified. The 
two numbers w 2 are given by 


We = ; 17.161 
: je (t—a)(t—b)(t—c)(t-—d ( ) 
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and are called the periods of the elliptic function z = I(w). The map w +> 
z = I(w) is a genuine function because the original z is uniquely determined 
by w. It is doubly periodic because 


I(w+nwy+ mw) =I(w), n,m eZ. (17.162) 


The inverse “function” w = J~'(z) is not a genuine function of z, however, 
because w increases by w 1 or w2 each time z goes around a curve deformable 
into a or @, respectively. The periods are complicated functions of a, b,c, d. 
If you recall our discussion of de Rham’s theorem from chapter 4, you 
will see that the w; are the results of pairing the closed holomorphic 1-form. 
“dw” = as € H'(T") (17.163) 
(z — a)(z — b)(z — e)(z —d) 
with the two generators of H,(T?). The quotation marks about dw are 
there to remind us that dw is not an exact form, 7.e. it is not the exterior 
derivative of a single-valued function w. This cohomological interpretation 
of the periods of the elliptic function is the origin of the use of the word 
“period” in the context of de Rham’s theorem. (See section 19.5 for more 
information on elliptic functions.) 

More general Riemann surfaces are oriented 2-manifolds that can be 
thought of as the surfaces of doughnuts with g holes. The number g is called 
the genus of the surface. The sphere has g = 0 and the torus has g = 1. 
The Euler character of the Riemann surface of genus g is x = 2(1 — g). For 
example, figure 17.20 shows a surface of genus three. The surface is in one 
piece, so dim Ho(M) = 1. The other Betti numbers are dim H,(/) = 6 and 
dim H2(M) = 1, so 


2 
x = )0(-1)?’dim H,(M) = 1-6+1=-4, (17.164) 

p=0 
in agreement with x = 2(1 — 3) = —4. For complicated functions, the genus 


may be infinite. 

If we have two complex variables z and w then a polynomial relation 
P(z,w) = 0 defines a complex algebraic curve. Except for degenerate cases, 
this one (complex) dimensional curve is simultaneously a two (real) dimen- 
sional Riemann surface. With 


P(z,w) = 2° +3w?z+w+3=0, (17.165) 
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2 oO, 


Oh, aL 
Figure 17.20: A surface M of genus 3. The non-bounding 1-cycles a; and 
(; form a basis of H,(M). The entire surface forms the single 2-cycle that 
spans H>(M). 


for example, we can think of z(w) being a three-sheeted function of w defined 
by solving this cubic. Alternatively we can consider w(z) to be the two- 
sheeted function of z obtained by solving the quadratic equation 


eo, 1 x (3 + 23) 
—W etn 
3Z 3z 


In each case the branch points will be located where two or more roots 
coincide. The roots of (17.166), for example, coincide when 


=i). (17.166) 


1 — 122(3 +z?) =0. (17.167) 


This quartic equation has four solutions, so there are four square-root branch 
points. Although constructed differently, the Riemann surface for w(z) and 
the Riemann surface for z(w) will have the same genus (in this case g = 1) 
because they are really are one and the same object — the algebraic curve 
defined by the original polynomial equation. 

In order to capture all its points at infinity, we often consider a complex 
algebraic curve as being a subset of CP?. To do this we make the defining 
equation homogeneous by introducing a third co-ordinate. For example, for 
(17.165) we make 


P(z,w) = 2+3w?z+w4+3 > P(z,w,v) = 22+3w?z+wv?+3v%. (17.168) 


The points where P(z,w,v) = 0 define’ a projective curve lying in CP?. 
Places on this curve where the co-ordinate v is zero are the added points at 


7A homogeneous polynomial P(z,w,v) of degree n does not provide a map from 
CP? —C because P(Az, Aw, Av) = A" P(z,w,v) usually depends on A, while the co- 
ordinates (Az, Aw, Av) and (z,w,v) correspond to the same point in CP?. The zero set 
where P = 0 is, however, well-defined in CP?. 
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infinity. Places where v is non-zero (and where we may as well set v = 1) 
constitute the original affine curve. 
A generic (non-singular) curve 


Pea) se tse or = 0, (17.169) 
with its points at infinity included, has genus 
1 
c= (4 — 1)(d — 2). (17.170) 


Here d = max(r +s) is the degree of the curve. This degree-genus relation 
is due to Pliicker. It is not, however, trivial to prove. Also not easy to prove 
is Riemann’s theorem of 1852 that any finite genus Riemann surface is the 
complex algebraic curve associated with some two-variable polynomial. 

The two assertions in the previous paragraph seem to contradict each 
other. “Any” finite genus, must surely include g = 2, but how can a genus 
two surface be a complex algebraic curve? There is no integer value of d such 
that (d— 1)(d — 2)/2 = 2. This is where the “non-singular” caveat becomes 
important. An affine curve P(z, w) = 0 is said to be singular at P = (20, wo) 


if all of 
aah aP aP 


Oz’ Aw’ 
vanish at P. A projective curve is singular at P € CP? if all of 
OF. -OF. “OP 
are zero there. If the curve has a singular point then then it degenerates and 
ceases to be a manifold. Now Riemann’s construction does not guarantee 
an embedding of the surface into CP?, only an immersion. The distinction 
between these two concepts is that an immersed surface is allowed to self- 
intersect, while an embedded one is not. Being a double root of the defining 
equation P(z,w) = 0, a point of self-intersection is necessarily a singular 
point. 

As an illustration of a singular curve, consider our earlier example of the 
curve 


Plz.40), 


Plzywv), 


w = (z—a)(2—b)(z—c)(z—d) (17.171) 


whose Riemann surface we know to be a torus once two some points are 
added at infinity, and when a, b,c, d are all distinct. The degree-genus formula 
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applied to this degree four curve gives, however, g = 3 instead of the expected 
g = 1. This is because the corresponding projective curve 


wv? = (z — av)(z — bv) (z — cv) (z — dv) (17.172) 


has a tacnode singularity at the point (z,w,v) = (0,1,0). Rather than 
investigate this rather complicated singularity at infinity, we will consider 
the simpler case of what happens if we allow b to coincide with c. When 6 
and c merge, the finite point P = (wo, 20) = (0,b) becomes a singular. Near 
the singularity, the equation defining our curve looks like 


0=w* —ad(z—b)’, (17.173) 


which is the equation of two lines, w = Vad(z — 6) and w = —Vad(z — b), 
that intersect at the point (w,z) = (0,6). To understand what is happening 
topologically it is first necessary to realize that a complex line is a copy of C 
and hence, after the point at infinity is included, is topologically a sphere. A 
pair of intersecting complex lines is therefore topologically a pair of spheres 
sharing a common point. Our degenerate curve only looks like a pair of 
lines near the point of intersection however. To see the larger picture, look 
back at the figure of the twice-cut plane where we see that as b approaches 
c we have an a cycle of zero total length. A zero length cycle means that 
the circumference of the torus becomes zero at P, so that it looks like a 
bent sausage with its two ends sharing the common point P. Instead of two 
separate spheres, our sausage is equivalent to a single two-sphere with two 
points identified. 


Figure 17.21: A degenerate torus is topologically the same as a sphere with 
two points identified. 
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As it stands, such a set is no longer a manifold because any neighbourhood of 
P will contain bits of both ends of the sausage, and therefore cannot be given 
co-ordinates that make it look like a region in R?. We can, however, simply 
agree to delete the common point, and then plug the resulting holes in the 
sausage ends with two distinct points. The new set is again a manifold, and 
topologically a sphere. From the viewpoint of the pair of intersecting lines, 
this construction means that we stay on one line, and ignore the other as it 
passes through. 


A similar resolution of singularities allows us to regard immersed surfaces 
as non-singular manifolds, and it is this sense that Riemann’s theorem is to 
be understood. When n such self-intersection double points are deleted and 
replaced by pairs of distinct points The degree-genus formula becomes 


we 5(d-1)(d—-2) <ip (17.174) 


and this can take any integer value. 


17.6.4 Conformal geometry of Riemann surfaces 


In this section we recall Hodge’s theory of harmonic forms from section 13.7.1, 
and see how it looks from a complex-variable perspective. This viewpoint 
reveals a relationship between Riemann surfaces and Riemann manifolds that 
forms an important ingredient in string and conformal field theory. 


Isothermal co-ordinates and complex structure 


Suppose we have a two-dimensional orientable Riemann manifold M with 
metric 


ds? = g,; dx'dz’. (17.175) 


In two dimensions g;; has three independent components. When we make a 
co-ordinate transformation we have two arbitrary functions at our disposal, 
and so we can use this freedom to select local co-ordinates in which only one 
independent component remains. The most useful choice is isothermal (also 
called conformal) co-ordinates x,y in which the metric tensor is diagonal, 
gig = €70;;, and so 

ds = e? (dx? + dy’). (17.176) 
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The e” is called the scale factor or conformal factor. If we set z = x + iy 
and Z = x — iy the metric becomes 


ds? = e?**) dz dz. (E77) 


We can construct isothermal co-ordinates for some open neighbourhood of 
any point in M. If in an overlapping isothermal co-ordinate patch the metric 
is 

ds = eT de dC, (17.178) 
and if the co-ordinates have the same orientation, then in the overlap region 
¢ must be a function only of z and ¢ a function only of Z. This is so that 


2 


ll rae (17.179) 


dc 


without any d¢? or de’ terms appearing. A manifold with an atlas of complex 
charts whose change-of-co-ordinate formulae are holomorphic in this way is 
said to be a complex manifold, and the co-ordinates endow it with a complex 
structure. The existence of a global complex structure allows to us to de- 
fine the notion of meromorphic and rational functions on M. Our Riemann 
manifold is therefore also a Riemann surface. 

While any compact, orientable, two-dimensional Riemann manifold has 
a complex structure that is determined by the metric, the mapping: metric 
— complex structure is not one-to-one. Two metrics g;;, gi; that are related 
by a conformal scale factor 


eM 6D d¢ = 072?) 


Gig => d(x", 2) Gi (17.180) 


give rise to the same complex structure. Conversely, a pair of two-dimensional 
Riemann manifolds having the same complex structure have metrics that are 
related by a scale factor. 

The use of isothermal co-ordinates simplifies many computations. Firstly, 
observe that g’’/,/g = 6;;, the conformal factor having cancelled. If you look 
back at its definition, you will see that this means that when the Hodge “x” 
map acts on one-forms, the result is independent of the metric. If w is a 
one-form 

w=pdr+qdy, (17.181) 


then 
xw = —qdx + pdy. (17.182) 
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Note that, on one-forms, 
ke = — 1, (17.183) 


With z=x+iy, 7 =x — ty, we have 
1 1 ile cat 
w= 5(P — ig) dz+ 5(P + iq) dz. (17.184) 


Let us focus on the dz part: 


1 1 
A = 5(p — ig) dz = 5(p — ig) (de + tidy). (17.185) 
Then i 
cA = 5(P — ig)(dy — idx) = —iA. (17.186) 
Similarly, with 
1 
B= 3? + ig) dz (17.187) 
we have 
B= 7B. (17.188) 


Thus the dz and dz parts of the original form are separately eigenvectors of x 
with different eigenvalues. We use this observation to construct a resolution 
of the identity Jd into the sum of two projection operators 


1 1 
Id = 5(1t+éx) +5(1—#), 


= Pp & P, (17.189) 


where P projects on the dz part and P onto the dz part of the form. 

The original form is harmonic if it is both closed dw = 0, and co-closed 
dxw = 0. Thus, in two dimensions, the notion of being harmonic (i.e. a 
solution of Laplace’s equation) is independent of what metric we are given. 
If w is a harmonic form, then (p —iq)dz and (p+ iq)dZ are separately closed. 
Observe that (p—iq)dz being closed means that 0z(p— iq) = 0, and so p—iq 
is a holomorphic (and hence harmonic) function. Since both (p— iq) and dz 
depend only on z, we will call (p—iq)dz a holomorphic 1-form. The complex 
conjugate form 

(p — ig)dz = (p + ig)dz (17.190) 


then depends only on Z and is anti-holomorphic. 


742 CHAPTER 17. COMPLEX ANALYSIS I 


Riemann bilinear relations 


As an illustration of the interplay of harmonic forms and two-dimensional 
topology, we derive some famous formuze due to Riemann. These formule 
have applications in string theory and in conformal field theory. 

Suppose that M is a Riemann surface of genus g, with a;, 9; = 1,...,g, 
the representative generators of H,(M) that intersect as shown in figure 
17.20. By applying Hodge-de Rham to this surface, we know that we can 
select a set of 2g independent, real, harmonic, 1-forms as a basis of H'(M,R). 
With the aid of the projector P we can assemble these into g holomorphic 
closed 1-forms w;, together with g anti-holomorphic closed 1-forms @;, the 
original 2g real forms being recovered from these as w; + @; and *(w; + 
w;) = i(@; — w;). A physical interpretation of these forms is as the z and 
Z components of irrotational and incompressible fluid flows on the surface 
M. It is not surprising that such flows form a 2g real dimensional, or g 
complex dimensional, vector space because we can independently specify the 
circulation ¢ v-dr around each of the 2g generators of H,(M). If the flow field 
has (covariant) components v,, vy, then w = v,dz where v, = (vz — ivy)/2, 
and @ = uzdz where vz = (vu, + ivy) /2. 

Suppose now that a and b are closed 1-forms on M. Then, either by 
exploiting the powerful and general intersection-form formula (13.77) or by 
cutting open the surface along the curves a;, 3; and using the more direct 
strategy that gave us (13.79), we find that 


far-r{fofe- fof (17.191) 


We use this formula to derive two bilinear relations associated with a closed 
holomorphic 1-form w. Firstly we compute its Hodge inner-product norm 


nee fone = ElLohe bol 
pigmeae 


AAB .B; — B,A;}, (17.192) 


We 


iMs i 
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where A; = ie. w and B; = le w. We have used the fact that @ is an anti- 
holomorphic 1 form and thus an eigenvector of x with eigenvalue 7. It follows, 
therefore, that if all the A; are zero then ||w|| = 0 and so w = 0. 

Let Ay = Lae Wj: The determinant of the matrix A;,; is non-zero: If it 
were zero, then there would be numbers A;, not all zero, such that 


0= AijAj = / (w5A;), (17.193) 
but, by (17.192), this implies that ||w;A,|]| = 0 and hence w;A; = 0, contrary 
to the linear independence of the w;. We can therefore solve the equations 

AigA jn = Sik (17.194) 


for the numbers Aj, and use these to replace each of the w; by the linear 
combination w,;A;;. The new w; then obey he w; = 6. From now on we 
suppose that this has be done. j 

Define 7%; = ve w,;. Observe that dz A dz = 0 forces w; Aw; = 0, and 
therefore we have a second relation 


g 
O= fun Aun = y {fom fon — ff sm f un 
M i=l ay 4 4 (of) 


3 {Ost ~~ TimiOin t 
i=1 


The matrix 7;; is therefore symmetric. A similar compuation shows that 


so the matrix (Im 7;;) is positive definite. The set of such symmetric matrices 
whose imaginary part is positive definite is called the Siegel upper half-plane. 
Not every such matrix correponds to a Riemann surface, but when it does it 
encodes all information about the shape of the Riemann manifold M that is 
left invariant under conformal rescaling. 


17.7 Further exercises and problems 


Exercise 17.11: Harmonic partners. Show that the function 


u = sinxzcosh y + 2cos x sinh y 
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is harmonic. Determine the corresponding analytic function u + iv. 


Exercise 17.12: Mobius Maps. The Map 


az+b 
cz+d 


Are us= 


is called a M6bius transformation. These maps are important because they are 
the only one-to-one conformal maps of the Riemann sphere onto itself. 


a) Show that two successive Mobius transformations 


gas az+b a Az’ +B 
~ ez+a’ C2’ +D 


give rise to another Mobius transformation, and show that the rule for 
combining them is equivalent to matrix multiplication. 

b) Let 21, z2, 23, z4 be complex numbers. Show that a necessary and suffi- 
cient condition for the four points to be concyclic is that their cross-ratio 


def (21 — 24)(23 — 22) 
a1 224 2345 24 SS a 


be real (Hint: use a well-known property of opposite angles of a cyclic 
quadrilateral). Show that Mobius transformations leave the cross-ratio 
invariant, and thus take circles into circles. 


Exercise 17.13: Hyperbolic geometry. The Riemann metric for the Poincaré- 
disc model of Lobachevski’s hyperbolic plane (See exercises 1.7 and 12.13) can 
be taken to be 


A|dz|? 2 
iS <1. 
"a-pepp | 
a) Show that the Mobius transformation 
zow= el 8 laj<1, AER 


az—1’ 
provides a 1-1 map of the interior of the unit disc onto itself. Show that 
these maps form a group. 

b) Show that the hyperbolic-plane metric is left invariant under the group 
of maps in part (a). Deduce that such maps are orientation-preserving 
isometries of the hyperbolic plane. 

c) Use the circle-preserving property of the Mébius maps to deduce that 
circles in hyperbolic geometry are represented in the Poincaré disc by 
Euclidean circles that lie entirely within the disc. 


17.7. FURTHER EXERCISES AND PROBLEMS 745 


The conformal maps of part (a) are in fact the only orientation preserving 
isometries of the hyperbolic plane. With the exception of circles centered at 
z = 0, the center of the hyperbolic circle does not coincide with the center 
of its representative Euclidean circle. Euclidean circles that are internally 
tangent to the boundary of the unit disc have infinite hyperbolic radius and 
their hyperbolic centers lie on the boundary of the unit disc and hence at 
hyperbolic infinity. They are known as horocycles. 


Exercise 17.14: Rectangle to Ellipse. Consider the map w +> z = sinw. Draw 
a picture of the image, in the z plane, of the interior of the rectangle with 
corners u = +7/2, v= +X. (w = utiv). Show which points correspond to 
the corners of the rectangle, and verify that the vertex angles remain 7/2. At 
what points does the isogonal property fail? 


Exercise 17.15: The part of the negative real axis where x < —1 is occupied 
by a conductor held at potential —Vo. The positive real axis for x > +1 
is similarly occupied by a conductor held at potential +Vo. The conductors 
extend to infinity in both directions perpendicular to the x — y plane, and so 
the potential V satisfies the two-dimensional Laplace equation. 
a) Find the image in the ¢ plane of the cut z plane where the cuts run from 
—1 to —oo and from +1 to +00 under the map z+ ¢ = sin7! z 
b) Use your answer from part a) to solve the electrostatic problem and 
show that the field lines and equipotentials are conic sections of the form 
ax? +by? = 1. Find expressions for a and b for the both the field lines and 
the equipotentials and draw a labelled sketch to illustrate your results. 


Exercise 17.16: Draw the image under the map z +> w = e**/® of the infinite 
strip S, consisting of those points z = «+ iy © C for which 0 < y < a. 
Label enough points to show which point in the w plane corresponds to which 
in the z plane. Hence or otherwise show that the Dirichlet Green function 
G(x, y;Xo, yo) that obeys 


V°G = 6(a — x0)5(y — yo) 
in S, and G(a, y; 20, yo) = 0 for (x, y) on the boundary of S$, can be written as 
1 
G(x, y;X0, Yo) = = In| sinh(a(z — zo)/2a)| +... 


The dots indicate the presence of a second function, similar to the first, that 
you should find. Assume that (29, yo) € S. 
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Exercise 17.17: State Laurent’s theorem for functions analytic in an annulus. 
Include formulae for the coefficients of the expansion. Show that, suitably 
interpreted, this theorem reduces to a form of Fourier’s theorem for functions 
analytic in a neighbourhood of the unit circle. 


Exercise 17.18: Laurent Paradox. Show that in the annulus 1 < |z| < 2 the 


function 
1 


I= Gano=a 
has a Laurent expansion in powers of z. Find the coefficients. The part of the 


series with negative powers of z does not terminate. Does this mean that f(z) 
has an essential singularity at z = 0? 


Exercise 17.19: Assuming the following series 


1 1 1 7 a 
——— = --=+7+ -—2% i 
sinh z z 6 16 j 


1 
|z|=1 2° sinh z 


1 
fx Gg 
te, 2 sinhz 


(Hint: The zeros of sinh z lie at z = nzi.) 


evaluate the integral 


Now evaluate the integral 


Exercise 17.20: State the theorem relating the difference between the number 
of poles and zeros of f(z) in a region to the winding number of argument of 
f(z). Hence, or otherwise, evaluate the integral 


524 +1 
T= ¢ ae 
Ce ese 
where C is the circle |z| = 2. Prove, including a statement of any relevent 
theorem, any assertions you make about the locations of the zeros of z°+z7+1. 
Exercise 17.21: Arcsine branch cuts. Let w = sin~!z. Show that 
w=nnrtiln{iz+ V1-— 27} 


with the + being selected depending on whether n is odd or even. Where 
would you put cuts to ensure that w is a single-valued function? 
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Figure 17.22: Concurrent 1-cycles on a genus-2 surface. 


Figure 17.23: The cut-open genus-2 surface. The superscripts L and R denote 
respectively the left and right sides of each 1-cycle, viewed from the direction 
of the arrow orienting the cycle. 
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Problem 17.22: Cutting open a genus-2 surface. The Riemann surface for the 
function 


y = V (2 — a1) (z — aa) (z — a3)(z — a4)(z — a5) (z — ag) 


has genus g = 2. Such a surface M is sketched in figure 17.22, where the four 
independent 1-cycles a1, and (1,2 that generate H,(M) have been drawn so 
that they share a common vertex. 


a) Realize the genus-2 surface as two copies of C U {oo} cross-connected by 
three square-root branch cuts. Sketch how the 1-cycles a; and (;, i = 1,2 
of figure 17.22 appear when drawn on your thrice-cut plane. 

b) Cut the surface open along the four 1-cycles, and convince yourself that 
resulting surface is homeomorphic to the octagonal region appearing in 
figure 17.23. 

c) Apply the direct method that gave us (13.79) to the octagonal region of 
part b). Hence show that for closed 1-forms a, b, on the surface we have 


frame ff af o-foaf a}. 


Chapter 18 


Applications of Complex 
Variables 


In this chapter we will find uses what we have learned of complex variables. 
The applications will range from the elementary to the sophisticated. 


18.1 Contour integration technology 


The goal of contour integration technology is to evaluate ordinary, real- 
variable, definite integrals. We have already met the basic tool, the residue 
theorem: 


Theorem: Let f(z) be analytic within and on the boundary 1 = OD of a 
simply connected domain D, with the exception of finite number of points 
at which the function has poles. Then 


$f) dz = S- 2ri (residue at pole). 
ij 


poles € D 


18.1.1 Tricks of the trade 


The effective application of the residue theorem is something of an art, but 
there are useful classes of integrals which we can learn to recognize. 


749 
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Rational trigonometric expressions 


Integrals of the form 


20 
| F (cos 6, sin 8) dé (18.1) 
0 


are dealt with by writing cos@ = 4(z +), sin@ = +(z— Z) and integrating 
around the unit circle. For example, let a, b be real and b < a, then 


r= [" dO =? 4 dz =i dz 
Jo atbcosd i jai 022+ 2az+b ib (z -—a)(z — 8B) 


(18.2) 
Since a = 1, only one pole is within the contour. This is at 


a = (-a+ Va? —B)/d. (18.3) 


The residue is 
2 ide: ot 1 (18.4) 
iba-B ivVv@—h ; 
Therefore, the integral is given by 
2m 


[= == (18.5) 


These integrals are, of course, also do-able by the “t” substitution t = 
tan(@/2), whence 


site Qdt 


= —__ panne 18.6 


; 2t 
sin 6 = Toe cos 6 
followed by a partial fraction decomposition. The labour is perhaps slightly 
less using the contour method. 


Rational functions 


Integrals of the form 

/ R(x) dx, (18.7) 
where R(x) is a rational function of « with the degree of the denominator 
exceeding the degree of the numerator by two or more, may be evaluated 
by integrating around a rectangle from —A to +A, A to A+7B, A+ 7B to 
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—A+iB, and back down to —A. Because the integrand decreases at least 
as fast as 1/|z|? as z becomes large, we see that if we let A,B — ov, the 
contributions from the unwanted parts of the contour become negligeable. 
Thus 


f=] es Residues of poles in upper half-plane) ; (18.8) 


We could also use a rectangle in the lower half-plane with the result 
I= —-2ni o> Residues of poles in lower half-plane ) : (18.9) 


This must give the same answer. 
For example, let n be a positive integer and consider 


os dx 
R= [. ae (18.10) 


The integrand has an n-th order pole at z = +7. Suppose we close the contour 
in the upper half-plane. The new contour encloses the pole at z = +7 and 
we therefore need to compute its residue. We set z —7 = ¢ and expand 


I 1 1 ic\-* 
(+2) [G+ OF +1 (ice (:- 7 
aie (: ve (5) 7‘ ae (4) + ) (18.11) 


The coefficient of ¢~1 is 


1 n(n+1)---(Qn-2) fi\""* 1 (Qn — 2)! (18.12) 
(2i)” (n —1)! 2 — 22n—15 ((n — 1)!)? 
The integral is therefore 
nm (2n—2)! 


T= 


a (ene (18.13) 


These integrals can also be done by partial fractions. 
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18.1.2. Branch-cut integrals 
Integrals of the form 


r= | x** R(x)da, (18.14) 
0 


where R(x) is rational, can be evaluated by integration round a slotted circle 
(or “key-hole”) contour. 


Figure 18.1: A slotted circle contour 1 of outer radius A and inner radius e. 


A little more work is required to extract the answer, though. 
For example, consider 


co a—1 
f2 | “de, 0<Rea<1. (18.15) 
o Lark 


The restrictions on the range of a@ are necessary for the integral to converge 
at its upper and lower limits. 

We take [ to be a circle of radius A centred at z = 0, with a slot indenta- 
tion designed to exclude the positive real axis, which we take as the branch 
cut of z°~1, and a small circle of radius € about the origin. The branch of 
the fractional power is defined by setting 


2°"! = exp[(a — 1)(In|z| + 20)], (18.16) 
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where we will take 6 to be zero immediately above the real axis, and 27 
immediately below it. With this definition the residue at the pole at z = —1 
is e'7(°-)_ The residue theorem therefore tells us that 


a-1 
¢ O dz = Qmietlo-0), (18.17) 
pl+z 


The integral decomposes as 


a-1 a—l A ya-l a-l 
¢ 7 de = ¢ eee, Pa (1- pe] ie ¢ 7 de 
plt+z isa - l+a2 pine Lae 
(18.18) 


As we send A off to infinity we can ignore the “1” in the denominator com- 
pared to the z, and so estimate 


if — if edz 
|zJ=A |z|=A 


This tends to zero provided that Rea < 1. Similarly, provided 0 < Rea, the 
integral around the small circle about the origin tends to zero with ¢. Thus 


<r <NeO. (18.19) 


a-l 
2 dz 
l+z 


—e™ Oni = (1— e?O“)) 7. (18.20) 


We conclude that 


om 
iS ee eee, (18.21) 
(em —e-™2) — sin Ta 


Exercise 18.1: Using the slotted circle contour, show that 


ae d T T 
[ It¢a” 2 sin(mp/2) 2 popee nD 2) 3 


Exercise 18.2: Integrate z¢~!/(z—1) around a contour Ty consisting of a 
semicircle in the upper half plane together with the real axis indented at 
z=Oandz=1 
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sy 
- hse - 
1 
Figure 18.2: The contour Ty. 
to get 
yal oo get oo get 
0-4 de =P [ da —in + (cosa +isinna) | dz. 
Te 1 0 x—-1l 0 x+ 1 


As usual, the symbol P in front of the integral sign denotes a principal part 
integral, meaning that we must omit an infinitesimal segment of the contour 
symmetrically disposed about the pole at z = 1. The term —ia comes from 
integrating around the small semicircle about this point. We get —1/2 of the 
residue because we have only a half circle, and that traversed in the “wrong” 
direction. Warning: this fractional residue result is only true when we indent 
to avoid a simple pole—i.e. one that is of order one. 


Now take real and imaginary parts and deduce that 


oo »,a—1 
, us de = —"—, 0 < Rea <1, 
9 l+2 sin 7a 


and 


ie) gel 
a dx =acotma, O<Rea<l. 
0 1-2 


18.1.3. Jordan’s lemma 


We often need to evaluate Fourier integrals 


I(k) = * e'®® R(x) dx (18.22) 


oe) 
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with R(x) a rational function. For example, the Green function for the 
operator —0? + m? is given by 


G(x) = / ae (18.23) 


go 20 k? + m2" 


Suppose « € R and x > 0. Then, in contrast to the analogous integral 
without the exponential function, we have no flexibility in closing the contour 
in the upper or lower half-plane. The function e“** grows without limit as 
we head south in the lower half-plane, but decays rapidly in the upper half- 
plane. This means that we may close the contour without changing the value 
of the integral by adding a large upper-half-plane semicircle. 


A 


[k 


° -im 


Figure 18.3: Closing the contour in the upper half-plane. 


The modified contour encloses a pole at k = im, and this has residue 
i/(2mje-"™.. Thus 
1 
G(e)=——e ™ > 0. 18.24 
(=z, (18.24) 
For x < 0, the situation is reversed, and we must close in the lower half-plane. 
The residue of the pole at k = —im is —i/(2m)e™’, but the minus sign is 


cancelled because the contour goes the “wrong way” (clockwise). Thus 


G(x) = ——e*™™, «<0. (18.25) 


G(x) = —e-™l, (18.26) 
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The formal proof that the added semicircles make no contribution to the 
integral when their radius becomes large is known as Jordan’s Lemma: 
Lemma: Let I be a semicircle, centred at the origin, and of radius R. Sup- 
pose 

i) that f(z) is meromorphic in the upper half-plane; 

ii) that f(z) tends uniformly to zero as |z| — oo for 0 < argz <7; 
iii) the number X is real and positive. 
Then 


aio dz—-0, as R-oo. (18.27) 
r 


To establish this, we assume that R is large enough that |f| < € on the 
contour, and make a simple estimate 


y e’™ f(z) dz 
r 


x 


a /2 
2Re f e sind do 
0 


a /2 
< ae | e 2ARO/n gp 
0 


= eG =e *) < ~ 


In the second inequality we have used the fact that (sin 0) /0 > 2/7 for angles 
in the range 0 < 0 < 7/2. Since € can be made as small as we like, the lemma 
follows. 

Example: Evaluate 


(18.28) 


I(a) = i. mlz) 5 (18.29) 


We have - 
I(a) =Im Wi apie ach. (18.30) 


If we take a > 0, we can close in the upper half-plane, but our contour must 
exclude the pole at z = 0. Therefore 


; F ~e . R : 
exp 7@z eXp 102 exp 7X exp 1X 
o= | aa | pat | E ax+ | eh das 
\z|=R z |z|=e & —R v € rv 


(18.31) 
As R — o, we can ignore the big semicircle, the rest, after letting « — 0, 
gives 


0 = -in+ pf < der. (18.32) 


co «(CU 
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Again, the symbol P denotes a principal part integral. The —iz comes from 
the small semicircle. We get —1/2 the residue because we have only a half 
circle, and that traversed in the “wrong” direction. (Remember that this 
fractional residue result is only true when we indent to avoid a s¢mple pole— 
i.e one that is of order one.) 

Reading off the real and imaginary parts, we conclude that 


/ at dr =n, pf dO, ws: (18.33) 


oo x oo x 


No “P” is needed in the sine integral, as the integrand is finite at x = 0. 
If we relax the condition that @ > 0 and take into account that sine is an 
odd function of its argument, we have 


/ ee deat sgn a. (18.34) 
—~oo x 


This identity is called Dirichlet’s discontinuous integral. 
We can interpret Dirichlet’s integral as giving the Fourier transform of 
the principal part distribution P(1/z) as 


pf de = in sen w. (18.35) 
Lage. 20 
This will be of use later in the chapter. 
Example: 
hy 
y 
) 4 


-X 


Figure 18.4: Quadrant contour. 
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We will evaluate the integral 


¢ eee hds (18.36) 
Cc 


about the first-quadrant contour shown above. Observe that when 0 < a < 1 
neither the large nor the small arc makes a contribution, and that there are 
no poles. Hence, we deduce that 


0= i ee de = if evyt telat dy, O<a<1. (18.37) 
0 0 
Taking real and imaginary parts, we find 


7 


| z*‘cosrdz = I(a)cos (<a) gr We, 
0 


i) 


| xz ‘sinadx = I(a)sin ( a) go. (Ue vies (18.38) 
0 


NIA 


where a 
ra) = | y* ‘e 4 dy (18.39) 
0 


is the Euler Gamma function. 
Example: Fresnel integrals. Integrals of the form 


i cos(mx* /2) da, (18.40) 


lee 

+ 

Sees 
| 


Sib) = [sino 2) a (18.41) 


occur in the theory of diffraction and are called Fresnel integrals after Au- 
gustin Fresnel. They are naturally combined as 


C(t) +iS(t) = / eit)? dy. (18.42) 


0 

The limit as t — oo exists and is finite. Even though the integrand does not 
tend to zero at infinity, its rapid oscillation for large x is just sufficient to 
ensure convergence. 


‘We can exhibit this convergence by setting 7? = s and then integrating by parts to 


get 
2 
is ina? /2 gq = ak ims /2 ds +4 gale ; 4 1 a ims /2 ds 
0 ° aa 0 ° gi/2 mil | 2a dy i 3/2" 


The right hand side is now manifestly convergent as t — oo. 
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As t varies, the complex function C(t)+75(t) traces out the Cornu Spiral, 
named after Marie Alfred Cornu, a 19th century French optical physicist. 


Figure 18.5: The Cornu spiral C(t) + iS(t) for t in the range —8 < t < 8. 
The spiral in the first quadrant corresponds to positive values of t. 


We can evaluate the limiting value 
C(co) + iS(co) = | eit)? dy (18.43) 
0 


by deforming the contour off the real axis and onto a line of length Z running 
into the first quadrant at 45°, this being the direction of most rapid decrease 
of the integrand. 


ey 


Figure 18.6: Fresnel contour. 
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A circular arc returns the contour to the axis whence it continues to oo, but 
an estimate similar to that in Jordan’s lemma shows that the arc and the 
subsequent segment on the real axis make a negligeable contribution when L 
is large. To evaluate the integral on the radial line we set z = e’”/4s, and so 


et /4Q9 ioe) 
imz? /2 _ oy —ms?/2 = at in/4 _ 1 . 
e dz2=e€ € ds = —=e"/*" = —(1+i7). (18.44) 
| 0 ig 2 2 


Figure 18.5 shows how C(t) + iS(¢) orbits the limiting point 0.5 + 0.57 and 
slowly spirals in towards it. Taking real and imaginary parts we have 


- Tz? ™ mo it 
i cos (=) ec / sin (=) av ; (18.45) 


18.2 The Schwarz reflection principle 


Theorem (Schwarz): Let f(z) be analytic in a domain D where OD includes 
a segment of the real axis. Assume that f(z) is real when z is real. Then 
there is a unique analytic continuation of f into the region D (the mirror 
image of D in the real axis) given by 


gz)=4 F(Z), 2€D, (18.46) 
either, z ER. 


S| 


Figure 18.7: The domain D and its mirror image D. 
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The proof invokes Morera’s theorem to show analyticity, and then appeals 
to the uniqueness of analytic continuations. Begin by looking at a closed 
contour lying only in D: 


¢ f@ dz, (18.47) 
Cc 


where C = {n(t)} is the image of C = {n(t)} C D under reflection in the 
real axis. We can rewrite this as 


yy. sa dn dn 
Tea: = p Toy Fae = $ soy Ga = f finde 0. (18.48) 


At the last step we have used Cauchy and the analyticity of f in D. Morera’s 
theorem therefore confirms that g(z) is analytic in D. By breaking a general 
contour up into parts in D and parts in D, we can similarly show that g(z) 
is analytic in DU D. 

The important corollary is that if f(z) is analytic, and real on some 
segment of the real axis, but has a cut along some other part of the real axis, 
then f(a + ie) = f(a — ie) as we go over the cut. The discontinuity disc f is 
therefore 2Im f(x + ie). 

Suppose f(z) is real on the negative real axis, and goes to zero as |z| — oo, 
then applying Cauchy to the contour [ depicted in figure 18.8. 


IS 


Figure 18.8: The contour Il for the dispersion relation. . 
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we find ‘ Im f( 
“Im f(a + ie 
ey ae. 18.49 
fQ-=f AES (18.49) 
for ¢ within the contour. This is an example of a dispersion relation. The 
name comes from the prototypical application of this technology to optical 
dispersion, %.e. the variation of the refractive index with frequency. 
If f(z) does not tend to zero at infinity then we cannot ignore the con- 
tribution to Cauchy’s formula from the large circle. We can, however, still 


write i fle) 
Zz 
IQ = mi eae dz, (18.50) 
and 
(jee (18.51) 


Oni Jn z—b° 


for some convenient point b within the contour. We then subtract to get 


(¢ — 4) / f(z) 
= f(b ——__+1_— dz. 18.52 
(o) = (0) +f ae (18.52) 
Because of the extra power of z downstairs in the integrand, we only need f 
to be bounded at infinity for the contribution of the large circle to tend to 
zero. If this is the case, we have 


(C—b) [°° Imf(ax + ie) 
™ i (x — b)(x@ —¢) 


This is called a once-subtracted dispersion relation. 

The dispersion relations derived above apply when ¢ lies within the con- 
tour. In physics applications we often need f(¢) for ¢ real and positive. What 
happens as ¢ approaches the axis, and we attempt to divide by zero in such 
an integral, is summarized by the Plemelj formule: If f(¢) is defined by 


FGF (0) sr 


de. (18.53) 


1 f plz) 
=— d 18.54 
poy= ef Mae (18.54) 
where I’ has a segment lying on the real axis, then, if x lies in this segment, 
1 
fla + te) — F(x —te)) = tp(z) 
1 . EP Re) sci 
= ae = = — . 18.55 
s(tle+ie) + fe- id) = fa (18.55) 
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As always, the “P” means that we are to delete an infinitesimal segment of 
the contour lying symmetrically about the pole. 


Figure 18.9: Origin of the Plemelj formulae. 


The Plemelj formule hold under relatively mild conditions on the function 
p(x). We won’t try to give a general proof, but in the case that p is analytic 
the result is easy to understand: we can push the contour out of the way 
and let ¢ — x on the real axis from either above or below. In that case the 
drawing in figure 18.9 shows how the the sum of these two limits gives the 
the principal-part integral and how their difference gives an integral round a 
small circle, and hence the residue p(z). 

The Plemelj equations usually appear in physics papers as the “ze” cabala 


1 1 
=P ( ; ) sind(e! - 2) (18.56) 
av’ —xtte x’ — 2 


A limit ¢ — 0 is always to be understood in this formula. 


ry 


4 
Re f A Im f 
x’-x 


Figure 18.10: Sketch of the real and imaginary parts of f(a’) = 1/(a'—«x—ie). 
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We can also appreciate the origin of the ze rule by examining the following 
identity: 
1 xa! ie 
ee SS eS EE ee Se 18.57 
e—(extie) (-—x)?+e (e’-a2)?+e ( ) 
The first term is a symmetrically cut-off version of 1/(x’ — x) and provides 
the principal-part integral. The second term sharpens and tends to the delta 


function +imd(a’ — x) ase > 0. 


Exercise 18.3: The Legendre function of the second kind Q,,(z) may be defined 
for positive integer n by the integral 


1 2 
ae=5f Pope ee etal 


Use Rodriguez’ formula to show that for x € [—1,1] we have 
Qn(x oF ie) Qn(x ie) = —imP, (x), 


where P,,(x) is the Legendre Polynomial. Show further that Q,,(z) satisfies the 
conditions for the validity of an unsubtracted dispersion relation, and hence 
deduce Neumann’s formula: 
Eat 
=if Palt) z¢[-1,1]. 


Oe 


18.2.1 Kramers-Kronig relations 


Causality is the usual source of analyticity in physical applications. If G(#) 
is a response function 


DP resieass (t) = i; Git _ tS ssase (t’) dt’ (18.58) 


Co 


then for no effect to anticipate its cause we must have G(t) = 0 for t < 0. 
The Fourier transform 


6@\s i) * C(t) dt, (18.59) 


is then automatically analytic everywhere in the upper half plane. Suppose, 
for example, we look at a forced, damped, harmonic oscillator whose dis- 
placement x(t) obeys 


+ 2Qye+ (0? +77)2 = F(t), (18.60) 
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where the friction coefficient 7 is positive. As we saw earlier, the solution is 
of the form 


Lt): [ Git, t')F(t')dt’, 


Co 


where the Green function G(t,t’) = 0 if t < t’. In this case 


O-1¢-V¥E) gin O(¢ — # bf 
G(t,t’) = . a=) (18.61) 
0, t<t 
and so : 
1 j 
x(t) — = / e Vt-t ) sin Q(t ss t') F(t’) dt’. (18.62) 


Because the integral extends only from 0 to +00, the Fourier transform of 
G(t, 0), 


: Ls fs 
G(w) = 7) i: ete sin Ot dt, (18.63) 
0 
is nicely convergent when Imw > 0, as evidenced by 
G(w) — (18.64) 
WwW ——_ . 
(w + iy)? — 0 


having no singularities in the upper half-plane? 

Another example of such a causal function is provided by the complex, 
frequency-dependent, refractive index of a material n(w). This is defined so 
that a travelling wave takes the form 


ext) Seren (18.65) 
We can decompose n into its real and imaginary parts 


n(w) = np(w) + inr(w) 


= naw) + aq) (18.66) 


Tf a pole in a response function manages to sneak into the upper half plane, then 
the system will be unstable to exponentially growing oscillations. This may happen, for 
example, when we design an electronic circuit containing a feedback loop. Such poles, and 
the resultant instabilities, can be detected by applying the principle of the argument from 
the last chapter. This method leads to the Nyquist stability criterion. 


766 CHAPTER 18. APPLICATIONS OF COMPLEX VARIABLES 


where y is the extinction coefficient, defined so that the intensity falls off 
as I x exp(—yn- x), where n = k/|k| is the direction of propapagation. A 
non-zero y can arise from either energy absorption or scattering out of the 
forward direction 


Being a causal response, the refractive index extends to a function ana- 
lytic in the upper half plane and n(w) for real w is the boundary value 


PUD) igaieat = lim n(w + ie) (18.67) 


of this analytic function. Because a real (E = E*) incident wave must give 
rise to a real wave in the material, and because the wave must decay in the 
direction in which it is propagating, we have the reality conditions 


y(-w+te) = —yw +e), 
ne(—w tie) = +npr(w +t ie) (18.68) 


with y positive for positive frequency. 


Many materials have a frequency range |w| < |wmin| where y = 0, so 
the material is transparent. For any such material n(w) obeys the Schwarz 
reflection principle and so there is an analytic continuation into the lower 
half-plane. At frequencies w where the material is not perfectly transparent, 
the refractive index has an imaginary part even when w is real. By Schwarz, n 
must be discontinuous across the real axis at these frequencies: n(w + ie) = 
ne tiny A n(w — ie) = ne —in;. These discontinuities of 2in; usually 
correspond to branch cuts. 


No substance is able to respond to infinitely high frequency disturbances, 
so n — 1 as |w| — oo, and we can apply our dispersion relation technology 
to the function n — 1. We will need the contour shown below, which has cuts 
for both positive and negative frequencies. 
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Figure 18.11: Contour for the n — 1 dispersion relation. 


By applying the dispersion-relation strategy, we find 


nw) =1+4 - | NT old. -| Tate) i) (18.69) 


1 w!’ —wW 1 


—co min 


for w within the contour. Using Plemelj we can now take w onto the real axis 
to get 


P Wmin / P co / 
new) = 14+ — mile) dus’ + — ly) du’ 

T Jigg Wl-—wW Et Jin. O —W 
P [oe) / 

= an — | ace dus”, 
T Jy2. wl —w? 

[o.e) / 

= ie <p | mi COM (18.70) 

MD eiie tl ad? 


In the second line we have used the anti-symmetry of n;(w) to combine the 
positive and negative frequency range integrals. In the last line we have used 
the relation w/k = c to make connection with the way this equation is written 
in R. G. Newton’s authoritative Scattering Theory of Waves and Particles. 
This relation, between the real and absorptive parts of the refractive index, 
is called a Kramers-Kronig dispersion relation, after the original authors.® 


3H. A. Kramers, Nature, 117 (1926) 775; R. de L. Kronig, J. Opt. Soc. Am. 12 (1926) 
547 
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If n — 1 fast enough that w?(n — 1) — 0 as |w| — oo, we can take the f 
in the dispersion relation to be w?(n — 1) and deduce that 


oo 72 / 
np=1+t <p | (=) malar (18.71) 
7 (ea) 


2 \w2} wl? —w? 
mien 


another popular form of Kramers-Kronig. This second relation implies the 
first, but not vice-versa, because the second demands more restrictive be- 
havior for n(w). 

Similar equations can be derived for other causal functions. A quantity 
closely related to the refractive index is the frequency-dependent dielectric 
“constant” 


e(w) = €1 + t€g. (18.72) 
Again € > 1 as |w| — oo, and, proceeding as before, we deduce that 
om ee Cae 
= = . 18. 
€1(w) += fo ne had (18.73) 


18.2.2 Hilbert transforms 


Suppose that f(a) is the boundary value on the real axis of a function every- 
where analytic in the upper half-plane, and suppose further that f(z) — 0 
as |z| — oo there. Then we have 
Le foe) 
z)=— —— dx 18.74 
LOS 5 tee (18.74) 
for z in the upper half-plane. This is because we may close the contour with 
an upper semicircle without changing the value of the integral. For the same 
reason the integral must give zero when z is taken in the lower half-plane. 
Using the Plemelj formulze we deduce that on the real axis, 


fa== f $2 


(ay RP Saat 


da’. (18.75) 


We can use this strategy to derive the Kramers-Kronig relations even if n; 
never vanishes, and so we cannot use the Schwarz reflection principle. 

The relation (18.75) suggests the definition of the Hilbert transform, Hw, 
of a function (a), as 
P f* vz’) 


T J» t-2 


(Hy)(x) = da’. (18.76) 
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Note the interchange of x, x’ in the denominator of (18.76) when compared 
with (18.75). This switch is to make the Hilbert transform into a convolution 
integral. Equation (18.75) shows that a function that is the boundary value 
of a function analytic and tending to zero at infinity in the upper half-plane is 
automatically an eigenvector of with eigenvalue —7. Similarly a function 
that is the boundary value of a function analytic and tending to zero at 
infinity in the lower half-plane will be an eigenvector with eigenvalue +7. (A 
function analytic in the entire complex plane and tending to zero at infinity 
must vanish identically by Liouville’s theorem.) 

Returning now to our original f, which had eigenvalue —7, and decom- 
posing it as f(x) = fr(x) +7fr(x) we find that (18.75) becomes 


fi(t) = (Hfr)(2), 
fr(2) —(Hfr)(2). (18.77) 
Conversely, if we are given a real function u(x) and set v(x) = (Hu)(z), 


then, under some mild restrictions on u (that it lie in some L?(R), p > 1, for 
example, in which case v(x) is also in L?(R).) the function 


faye af. De) (18.78) 


271 L-z 


—co 


will be analytic in the upper half plane, tend to zero at infinity there, and 
have u(x) + iv() as its boundary value as z approaches the real axis from 
above. The last line of (18.77) therefore shows that we may recover u(x) 
from v(x) as u(x) = —(Hv)(x). The Hilbert transform H : L?(R) — L?(R) 
is therefore invertible, and its inverse is given by H~' = —H. (Note that 
the Hilbert transform of a constant is zero, but the L?(R) condition excludes 
constants from the domain of 7H, and so this fact does not conflict with 
invertibility. ) 

Hilbert transforms are useful in signal processing. Given a real signal 
Xp(t) we can take its Hilbert transform so as to find the corresponding 
imaginary part, X(t), which serves to make the sum 


Z(t) = Xp(t) +1X7(t) = Ate’ (18.79) 


analytic in the upper half-plane. This complex function is the analytic sig- 
nal.4 The real quantity A(t) is then known as the instantaneous amplitude, 


4D. Gabor, J. Inst. Elec. Eng. (Part 3), 93 (1946) 429-457. 


770 CHAPTER 18. APPLICATIONS OF COMPLEX VARIABLES 


or envelope, while ¢(t) is the instantaneous phase and 
wir(t) = o(t) (18.80) 


is called the instantaneous frequency (IF). These quantities are used, for 
example, in narrow band FM radio, in NMR, in geophysics, and in image 
processing. 
Exercise 18.4: Let f(w) = [oo e™' f(t) dt denote the Fourier transform of 
f(t). Use the formula (18.35) for the Fourier transform of P(1/t), combined 
with the convolution theorem for Fourier transforms, to show that the Fourier 
transform of the Hilbert transform of f(t) is 


——_ 


(Hf)(w) = isgnw) fv). 


Deduce that the analytic signal is derived from the original real signal by 
suppressing all positive frequency components (those proportional to e~“* 
with w > 0) and multiplying the remaining negative-frequency amplitudes by 
two. 


Exercise 18.5: Suppose that y1(x) and wo(ax) are real functions with finite 
L?(R) norms. 


a) Use the Fourier transform result from the previous exercise to show that 


(g1, 92) = (Hy1, Hye). 


Thus, H is a unitary transformation from L?(R) — L?(R). 
b) Use the fact that H? = —I to deduce that 


(Hpi, v2) = —(¢1, Hee) 


and so H! = —H. 
c) Conclude from part b) that 


[o-e) (oe) [oe) (oe) 
[ ae) (rf 2 ay) av =f gal) Gy 20) tr) ay, 
—oo -—o FY —oo -—o %—Y 
i.e., for L?(R), functions, it is legitimate to interchange the order of “P” 
integration with ordinary integration. 
d) By replacing yi(x) by a constant, and yo(a) by the Hilbert transform 
of a function f with f fdx # 0, show that it is not always safe to 
interchange the order of “P” integration with ordinary integration 
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Exercise 18.6: Suppose that are given real functions u;(a) and u2(a) and sub- 
stitute their Hilbert transforms v, = Huy, ve = Hug into (18.78) to construct 
analytic functions f,(z) and f(z) that are analytic in the upper half-plane 
and tend to zero at infinity there. Then, as we approach the real axis from 
above, the product f(z) fo(z) = F(z) has boundary value 


FR(x + i€) + iFy(« + ie) = (uy U2 es: U1U2) + i(uzv2 + u2v}). 


By assuming that F(z) satisfies the conditions for (18.77) to be applicable to 
this boundary value, deduce that 


H ((Huy)u2) + H((Hue2)ur) — (Hui)(Hu2) = —uzu2.  (*) 


This result® sometimes appears in the physics literature® in the guise of the 
distributional identity 


Be oP P P P P 
To toe a 
L-YYr2 YrRE-L 2-e2e—y 


where P/(x — y) denotes the principal-part distribution P (1 /(x- y)). This 
attractively symmetric form conceals the fact that x is being kept fixed, while 
y and z are being integrated over in specific orders. As the next exercise shows, 
were we to freely re-arrange the integration order we could use the identity 


1 1 1 1 1 
+ + 
Z-YY-Z Yr-zZ-Un z-xULE 


=0 2,y,z2 distinct 


to wrongly conclude that the right-hand side of (x) is zero. 


Problem 18.7: Show that the identity (x) from exercise 18.6 can be written as 


i ts Se dy = i Te ay) dz—n* uy (2)u2(zx), 


principal-part integrals being understood where necessary. This is a special 
case of a more general change-of-integration-order formula 


oo oo L,Y, Z oo oo x Zz 
: (| f( ] ) az) ay= [ (| f( Ys ) ty) den? f(e, 2,2), 
~co \J-o0 (2— y)(y — 2) ~co \J-o0 (2 - y)(y — 2) 
° A sufficient condition for its validity is that u; € L?'(IR), u2 € L??(IR), where p; and po 
are greater than unity and 1/p; +1/p2 < 1. See F. G. Tricomi, Quart. J. Math. (Oxford), 


series 2, 2, (1951) 199. 
°For example, in R. Jackiw, A. Strominger, Phys. Lett. 99B (1981) 133. 
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which was first obtained by G. H. Hardy in 1908. Hardy’s result is often 
referred to as the Poincaré-Bertrand theorem. 

Verify Hardy’s formula for the particular case where x is zero and f(0,y, z) 
is unity when both y and z lie within the interval [—a,a] but zero elsewhere. 
You will need to show that 


= a—x\? dex 
) In ( ) — = —n°*sen(a). 
0 a+r x x 


(Hint: Observe that the integrand is singular at « = |a|. Explain why it is 
legitimate to evaluate the improper integral by expanding the logarithms as a 
power series in x or x~', and then integrating term-by-term.) 


Exercise 18.8: Use the licit interchange of “P” integration with ordinary in- 
tegration to show that 


[ (p [- 2a) a= Of Hear 


Exercise 18.9: Let f(z) be analytic within the unit circle, and let u(@) and 
v(@) be the boundary values of its real and imaginary parts, respectively, at 
z= e'®. Use Plemelj to show that 


1 20 ; @—6' ; 1 20 F P 
Wor -sP [ v(o")eot (—S—) aot +5 f° w(o") ao 


20 _ gl 20 
: i u(6') cot (° : ) do’ + = | v(0') dé’. 
27 Jo 2 27 Jo 


18.3 Partial-fraction and product expansions 


S 
Ss 
Shier 
| 
| 
v 


In this section we will study other useful representations of functions which 
devolve from their analyticity properties. 


18.3.1 Mittag-LefHler partial-fraction expansion 


Let f(z) be a meromorphic function with poles (perhaps infinitely many) 
a 2 =z (F Sy 2, Byres) Where 2] <2] < aaa Let Fy, bea contour 
enclosing the first n poles. Suppose further (for ease of description) that the 
poles are simple and have residue r,,. Then, for z inside [’,, we have 


em iar eye 


j 
. / es = 
278 Jy, 2 —z I 2% 


(18.81) 
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We often want to to apply this formula to trigonometric functions whose 
periodicity means that they do not tend to zero at infinity. We therefore 
employ the same strategy that we used for dispersion relations: we subtract 
f(0) from f(z) to find 


F(z) — £0) = maf seca tn (— + =| . (18.82) 


If we now assume that f(z) is uniformly bounded on the [,, — this meaning 
that |f(z)| < A on T,,, with the same constant A working for all n — then 
the integral tends to zero as n becomes large, yielding the partial fraction, 
or Mittag-Leffler, decomposition 


fe) = 10+ on ( d ++) (18.83) 


Zo £4 5 


Example: Consider cosec z. The residues of 1/(sin z) at its poles at z = nv 
are Tr, = (—1)”. We can take the I’, to be squares with corners (n+1/2)(41+ 
i)w. A bit of effort shows that cosec is uniformly bounded on them. To use 
the formula as given, we first need subtract the pole at z = 0, then 


cosec z — : = 3 cy" ( a =| . (18.84) 


zn NT 


The prime on the summation symbol indicates that we are omit the n = 0 
term. The positive and negative n series converge separately, so we can add 
them, and write the more compact expression 


(oe) 


1 1 
=-42 —1)"——__.. 18.85 
Cosec 2 Se | ) 2 ( ) 
Example: A similar method applied to cot z yields 
So at 
1 1 1 
eS —). 18.86 
ss rae (—=+=) ( ) 
We can pair terms together to writen this as 
1 << 1 1 
as 
se z (ato) 
1 < 2z 
~ 21 Lae oe 
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or 
N 


cot z= sim > (18.88) 
zZ—-nt 


In the last formula it is important wes: the upper and lower limits of summa- 
tion be the same. Neither the sum over positive n nor the sum over negative 
n converges separately. By taking asymmetric upper and lower limits we 
could therefore obtain any desired number as the limit of the sum. 


Exercise 18.10: Use Mittag-Leffler to show that 


oe) 


1 
2 5 
COSe€C Zz = (tn : 


nN=— CO 


Now use this infinite series to give a one-line proof of the trigonometric identity 
ee mr 
oe cosec” (2 a ~) = N*cosec?(Nz). 


(Is there a comparably easy elementary derivation of this finite sum?) Take a 
limit to conclude that 


= mart 1 
Ss" cosec” (=) = rica —1). 
m=1 
Exercise 18.11: From the partial fraction expansion for cot z, deduce that 
f In|( )/z] ene en a oa 
— In|(sin z)/z] = — 
dz 


Integrate this along a suitable path from z = 0, and so conclude that that 


sins ==]] (1- Sa): 


Exercise 18.12: By differentiating the partial fraction expansion for cot z, show 
that, for k an integer > 1, and Imz > 0, we have 
oe) k oe) 
> 1 - an) 2 ara kedming 
(z+n)*t1 


n=—C 


This is called Lipshitz’ formula. 
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Exercise 18.13: The Bernoulli numbers are defined by 


x 2k 


xv 
=1+8B B 
i + cS Waa Qh! 


e* — 
The first few are By = —1/2, Bp = 1/6, By, = —1/30. Except for B,, the By, 
are zero for n odd. Show that 
92k 92k 


4 [o-e) 
xcotxz = 1x + se =1- dC Bao 


e2tr —] 


By expanding 1/ (x? _ n?q7) as a power series in x and comparing coefficients, 


deduce that, for positive integer k, 


= 12 ine A sal : 
— B 
Lm Cla 


Exercise 18.14: Euler-Maclaurin sum formula. let f(x) be a real-analytic 
function. Use the formal expansion 


Dk 1 1D? 1D4 
]\ Bpa SD ee 
as sar 2 62 304 


with D interpreted as d/dx, to obtain 


By integrating this formula from 0 to M, where M is an integer, motivate the 
Euler-Maclaurin sum formula: 


4 f(0) + (1) +--+ FM =1) + $F(M) 
= [sade Bie pON aD — f-Q). 
) 


The left hand side is the trapezium rule approximation to the integral on the 
right hand side. This derivation gives no insight into whether the infinite-sum 
correction to the trapezium rule converges (usually it does not), or what the 
error will be if we truncate the sum after a finite number of terms. When 
f(x) is a polynomial, however, only a finite number of derivatives fk) are 
non-zero, and the result is exact. 
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18.3.2 Infinite product expansions 


We can play a variant of the Mittag-Lefller game with suitable entire func- 
tions g(z) and derive for them a representation as an infinite product. Sup- 
pose that g(z) has simple zeros at z;. Then (Ing)! = g‘(z)/g(z) is meromor- 
phic with poles at z;, all with unit residues. Assuming that it satisfies the 
uniform boundedness condition, we now use Mittag-Leffler to write 


> ie =): (18.89) 


<ing(2) = 


Integrating up we have 


In g(z) = Ing(0) + cz + Ss" (in( SB) A ye =) ; (18.90) 


j=l 


where c = g'(0)/g(0). We now re-exponentiate to get 


oe" TI (1 = =) el, (18.91) 
= J 


Example: Let g(z) = sin z/z, then g(0) = 1, while the constant c, which is 
the logarithmic derivative of g at z = 0, is zero, and 


(oe) 


sin z =I (2 _ —) e2/nt (1 oe —) ee/nT (18.92) 
ni 


Thus 


sinz = z Il (1- =) (18.93) 


Convergence of infinite products 


We have derived several infinite problem formulae without discussing the issue 
of their convergence. For products of terms of the form (1+a,) with positive 
dp, we can reduce the question of convergence to that of )°°, dn. 

To see why this is so, let 


N 
PN = [[a ot i) age 0, (18.94) 


n=1 
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Then we have the inequalities 


N N 
14 Yom <n cospy Dah (18.95) 
n=1 n=1 


The infinite sum and product therefore converge or diverge together. If 


P=[[@+la,J), (18.96) 
n=1 
converges, we say that 
p= |[Q+a,), (18.97) 
n=1 


converges absolutely. As with infinite sums, absolute convergence implies 
convergence, but not vice-versa. Unlike infinite sums, however, an infinite 
product containing negative a, can diverge to zero. If (1+ a,) > 0 then 
[[G + a,) converges if $>In(1 + a,) does, and we will say that [](1 + a,) 
diverges to zero if }>In(1 + a,,) diverges to —oo. 


Exercise 18.15: Show that 


= 
——N 
i 
+ 
3] 
N——_” 
II 
= 
+ 
on 


3 
ll 
an 


on 
iN 
KH 
| 
Sle 
nN—_ 
l 
= 


3 
Il 
NO 


From these deduce that 


Exercise 18.16: For |z| < 1, show that 


oe) 


n 1 
1+27")= 
II ( +z ) ae 
n=0 
(Hint: think binary) 
Exercise 18.17: For |z| < 1, show that 
(oe) (oe) 1 
[Ta+2) = II (= ns” 
n=1 n=1 


(Hint: 1 — a?" = (1—2”")(1+2").) 
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18.4 Wiener-Hopf equations II 


The theory of Hilbert transforms has shown us some the consequences of 
functions being analytic in the upper or lower half-plane. Another applica- 
tion of these ideas is to Wiener-Hopf equations. Although we have discussed 
Wiener-Hopf integral equations in chapter 9, it is only now that we pos- 
sess the tools to appreciate the general theory. We begin, however, with 
the slightly simpler Wiener-Hopf sum equations, which are their discrete 
analogue. Here, analyticity in the upper or lower half-plane is replaced by 
analyticity within or without the unit circle. 


18.4.1 Wiener-Hopf sum equations 


Consider the infinite system of equations 


th= >~ Gn-mfm, —0O<M<00 (18.98) 


mMm=— CoO 


where we are given the y, and are seeking the x. 
If the an, Yn are the Fourier coefficients of smooth complex-valued func- 
tions 


A(@) = 3 anes, 


Voy So gee, (18.99) 


then the systems of equations is, in principle at least, easy to solve. We 
introduce the function 


X(0) = = tne, (18.100) 
and (18.98) becomes 
Y (0) = A(0)X (6). (18.101) 


From this, the desired x, may be read off as the Fourier expansion coefficients 
of Y(0)/A(0). We see that A(@) must be nowhere zero or else the operator A 
represented by the infinite matrix a@,_,, will not be invertible. This technique 
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is a discrete version of the Fourier transform method for solving the integral 
equation 


ys = / A(s — t)x(t) dt, —-co<s<oo. (18.102) 

The connection with complex analysis is made by regarding A(@), X (@), Y (8) 
as being functions on the unit circle in the z plane. If they are smooth enough 
we can extend their definition to an annulus about the unit circle, so that 


A(z) = 3 tie 
Key. = Line 
Ze s es (18.103) 


The x, may now be read off as the Laurent expansion coefficients of Y(z)/A(z). 
The discrete analogue of the Wiener-Hopf integral equation 


Us) | A(s —t)z(t)dt, O<s<oo (18.104) 
0 
is the Wiener-Hopf sum equation 


Wi Caning: OS tee on: (18.105) 


m=0 


This requires a more sophisticated approach. If you look back at our earlier 
discussion of Wiener-Hopf integral equations in chapter 9, you will see that 
the trick for solving them is to extend the definition y(s) to negative s (anal- 
ogously, the y,, to negative n) and find these values at the same time as we 
find a(s) for positive s (analogously, the x, for positive n.) 

We proceed by introducing the same functions A(z), X(z), Y(z) as before, 
but now keep careful track of whether their power-series expansions contain 
positive or negative powers of z. In doing so, we will discover that the 
Fredholm alternative governing the existence and uniqueness of the solutions 
will depend on the winding number N = n(I,0) where [ is the image of the 
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unit circle under the map z+ A(z) — in other words, on how many times 
A(z) wraps around the origin as z goes once round the unit circle. 

Suppose that A(z) is smooth enough that it is analytic in an annulus 
including the unit circle, and that we can factorize A(z) so that 


A(z) = Age (2)2™ [a-(2))7, (18.106) 


where 


q+(z) = 1+ 5 ate, 
n=1 


gz) = Day ge". (18.107) 
n=1 


Here we demand that q;(z) be analytic and non-zero for |z| < 1+ ¢, and 
that q_(z) be analytic and non-zero for |1/z| < 1+. These no pole, no 
zero, conditions ensure, via the principle of the argument, that the winding 
numbers of q.(z) about the origin are zero, and so all the winding of A(z) is 
accounted for by the N-fold winding of the z% factor. The non-zero condition 
also ensures that the reciprocals [qi.(z)]~! have same class of expansions (i.e. 
in positive or negative powers of z only) as the direct functions. 

We now introduce the notation [F'(z)|, and [F(z)]_, meaning that we 
expand F'(z) as a Laurent series and retain only the positive powers of z 
(including z°), or only the negative powers (starting from z~'), respectively. 
Thus F(z) = [F(z)]4+[F(z)|-. We will write Y(z) = [Y(z)]+, and similarly 
for X(z). We can therefore rewrite (18.105) in the form 


Nay (2) Xe = [Ve(2) + Yao: (18.108) 


If N > 0, and we break this equation into its positive and negative powers, 
we find 


Y4¢-]+ = AZM gy (z)X4, 
[Y,¢-]. = —Y_q_(z). (18.109) 


From the first of these equations we can read off the desired x,, as the positive- 
power Laurent coefficients of 


X4(z) = [Yaq-] A2%ae(2))?- (18.110) 
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As a byproduct, the second alows us to find the coefficient y_,, of Y_(z). 
Observe that there is a condition on Y, for this to work: the power series 
expansion of \z%q,(z)X4 starts with 2‘, and so for a solution to exist the 
first N terms of (Y,q_), as a power series in z must be zero. The given 
vector y, must therefore satisfy N consistency conditions. A formal way of 
expressing this constraint begins by observing that it means that the range of 
the operator A represented by the matrix @p—m falls short, by N dimensions, 
of the being the entire space of possible y,. This is exactly the situation that 
the notion of a “cokernel” is intended to capture. Recall that if A: V —- V, 
then Coker A = V/Im A. We therefore have 


dim [Coker A] = N. 
When N < 0, on the other hand, we have 


[Az lay (2) X4(z)]4 
~Y_(z)q_(z) + [Az lq, (2)X4(z)]-- (18.111) 


x 
—_~ 
Ri & 
YN os 
roadie 
—_~ — 
Ri ®& 
ios 
Il Il 


Here the last term in the second equation contains no more than N terms. Be- 
cause of the z~!"!, we can add any to X, any multiple of Z,(a) = 2"[q,(z)|7! 
forn =0,..., N—1, and still have a solution. Thus the solution is not unique. 
Instead, we have dim [Ker (A)] = |]. 


We have therefore shown that 
Index (A) © dim (Ker A) — dim (Coker A) = —N 


This connection between a topological quantity — in the present case the 
winding number — and the difference in dimension of the kernel and cokernel 
is an example of an index theorem. 

We now need to show that we can indeed factorize A(z) in the desired 
manner. When A(z) is a rational function, the factorization is straightfor- 
ward: if 


A(z) = at (18.112) 


we simply take 
— Tlhesiso(t — 2/4n) 


C2) S TIp..jo0) = 2/Bn)’ (18.113) 
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where the products are over the linear factors corresponding to poles and 
zeros outside the unit circle, and 


ge) = Thoi<coll = Pm /2), (18.114) 
TThanj<o(t — @n/2) 
containing the linear factors corresponding to poles and zeros inside the unit 
circle. The constant \ and the power 2% in equation (18.106) are the factors 
that we have extracted from the right-hand sides of (18.113) and (18.114), 
respectively, in order to leave 1’s as the first term in each linear factor. 
More generally, we take the logarithm of 


z A(z) = Age (2)(a-(2))* (18.115) 


to get 
In{z—* A(z)] = In[Aqs (z)] — In[q_(z)], (18.116) 


where we desire In|Aq;(z)] to be the boundary value of a function analytic 
within the unit circle, and In[q_(z)] the boundary value of function analytic 
outside the unit circle and with q_(z) tending to unity as |z| — oo. The 
factor of z~% in the logarithm serves to undo the winding of the argument 
of A(z), and results in a single-valued logarithm on the unit circle. Plemelj 
now shows that 


Q(z) = : ale A ae (18.117) 


i eae 


provides us with the desired factorization. This function Q(z) is everywhere 
analytic except for a branch cut along the unit circle, and its branches, Q 4+. 
within and Q_ without the circle, differ by In{z~“ A(z)]. We therefore have 


Nq(z) = e@+(2) 
q_(z) = e@-®, (18.118) 


The expression for Q as an integral shows that Q(z) ~ const./z as |z| 
goes to infinity and so guarantees that q_(z) has the desired limit of unity 
there. 

The task of finding this factorization is known as the scalar Riemann- 
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Hilbert problem. In effect, we are decomposing the infinite matrix 


A= Ce ce 6 es ao ay +c" (18.119) 


l gq ee 
U = r 0 1 qh ete 5 (18.120) 
Cn ( aml 
a lower triangular matrix L, where 
ik 0 O :-: 
LS Gay Tit sees (18.121) 
ee Ss ae 


has 1’s on the diagonal, and a matrix A“ which which is zero everywhere 
except for a line of 1’s located N steps above the main diagonal. The set 
of triangular matrices with unit diagonal form a group, so the inversion 
required to obtain L results in a matrix of the same form. The resulting 
Birkhoff factorization 

A=LA*U, (18.122) 


is an infinite-dimensional extension of the Gauss-Bruhat (or generalized LU) 
decomposition of a matrix. The finite-dimensional Gauss-Bruhat decompo- 
sition provides a factorization of a matrix A € GL(n) as 


A=LHU, (18.123) 


where L is a lower triangular matrix with 1’s on the diagonal, U is an upper 
triangular matrix with no zero’s on the diagonal, and ITI is a permutation 
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matrix, i.e. a matrix that permutes the basis vectors by having one entry of 
1 in each row and in each column, and all other entries zero. Our present AY 
is playing the role of such a matrix. The matrix II is uniquely determined 
by A. The L and U matrices become unique if L is chosen so that TI” LIT 
is also lower triangular. 


18.4.2 Wiener-Hopf integral equations 


We now carry over our insights from the simpler sum equations to Weiner- 
Hopf integral equations 


i K(x —y)o(y) dy = f(x), «>0, (18.124) 


by imagining replacing the unit circle by a circle of radius R, and then taking 
R— oo in such a way that the sums go over to integrals. In this way many 
features are retained: the problem is still solved by factorizing the Fourier 
transform 


K(k) = / K(x)e** dx (18.125) 
of the kernel, and there remains an index theorem 
dim (Ker A’) — dim (Coker Kk’) = —N, (18.126) 


but N now counts the winding of the phase of K(k) as k ranges over the real 
axis: 
1 ~ |k=+00 
N= OF arg Ix (18.127) 


k=—0o 


One restriction arises though: we will require K to be of the form 
K(@ —y) = 6(@—y) + g(@ —y) (18.128) 


for some continuous function g(x). Our discussion is therefore being re- 
stricted to Wiener-Hopf Integral equations of the second kind. 

The restriction comes about about because we will seek to obtain a fac- 
torization of K as 


T(x) K(k) = exp{Q+(k) — Q_(k)} = a4(k)(q_(A))* (18.129) 
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where q.(k) = exp{Q,(k)} is analytic and non-zero in the upper half k-plane 
and q_(k) = exp{Q_(k)} analytic and non-zero in the lower half-plane. The 
factor T(K) is a phase such as 


Ae eis (18.130) 


which winds —N times and serves serves to undo the +N phase winding in 
Kk. The Q(k) will be the boundary values from above and below the real 
axis, respectively, of 


Q(k) = oe ie mnir(e)K(®)) 4, (18.131) 


(Ont K—k 


—co 


The convergence of this infinite integral requires that In[r(«).(k)] go to zero 
at infinity, or, in other words, 


lim K(k) = 1. (18.132) 


k—oo 


This, in turn, requires that the original AK (a) contain a delta function. 
Example: We will solve the problem 


o(r) — A ee et -¥l-o?—9) b(y) dy = f(x), «> 0. (18.133) 


We require that 0 < @ < 1. The upper bound on a is necessary for the 
integral kernel to be bounded. We will also assume for simplicity that ’ < 
1/2. Following the same strategy as in the sum case, we extend the integral 
equation to the entire range of x by writing 


6a) <a f° evo oly) dy = fle) + g(a), (18.134) 


where f(x) is nonzero only for > 0 and g(x) is non-zero only for 7 < 0. 
The Fourier transform of this equation is 


(s + ia)? +a? 


ae S SW) = F049), (08135) 


where a? = 1—2) and the + subscripts are to remind us that 4(k) and f(k) 
are analytic in the upper half-plane, and g(k) in the lower. We will use the 
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notation H, for the space of functions analytic in the upper half plane, and 
H_ for functions analytic in the lower half plane, and so 


O+(k), fk) € Hy, g-(k) € HK (18.136) 


We can factorize 


~~.  (k+ia)?+a?  [k+i(a—a)|[k+i(a+a)| 
A) = Gerad = BEGGS Uiieeaay), 6 ee? 


Now suppose that a is small enough that a +a > 0 and so the numerator 
has two zeros in the lower half plane, and the numerator a one zero in each 
of the upper and lower half-planes. The change of phase in K(k) as we go 
from minus to plus infinity is therefore —27, and so the index is N = —1. 
We should therefore multiply K by 


a= ( : a "| i. (18.138) 


before seeking to break it into its g4 factors. We can however equally well 
take , 
+i(a—1 

k) = | ——— 18.139 

7h) pa) ( ) 


as this also undoes the winding and allows us to factorize with 


a. are Geant | (18.140) 


The resultant equation analagous to (18.108) is therefore 


¢4 = (Ta-)fp + 79-9 (18.141) 


The second line of this equation shows the interpretation of the first line in 
terms of the objects in the general theory. The left hand side is in Hy — 
i.e. analytic in the upper half-plane. The first term on the right is also in 
H,. (We are lucky. More generally it would have to be decomposed into its 
Hx parts.) If it were not for the 7T(«), the last term would be in H_, but 
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it has a potential pole at k = —i(a — a). We therefore remove this pole by 
substracting a term 
iB 
k +i(a— a) 


(an element of H,) from each side of the equation before projecting onto the 
H™ parts. After projecting, we find that 


H_: (ition), = Cerne (18.142) 


We solve for ¢,.(k) and 9_(k) 
z.(k) = (ee +1) 7 ~o(AttetD) 


(k + ia)? + a? + ia)? + a? 
g_(k) west (18.143) 


Observe g_(k) is always in H_ because its only singularity is in the upper 
half-plane for any 3. The constant (@ is therefore arbitrary. Finally, we invert 
the Fourier transform, using 


a 


F (O(z)e"™* sinh ax) = ~(k+ia)? + a2’ 


(ata)>0, (18.144) 


to find that 
ofa) = fle) [ e- sinhrale— y) flv) dy 
+6" {(a— YeW“@*9* 4 (at 1)e"@-9*}, (18.145) 


where (' (proportional to 3) is an arbitrary constant. 

By taking a in the range —1 < a < 0 with (a +a) < 0, we make index 
to be N = +1. We will then find there is condition on f(x) for the solution 
to exist. This condition is, of course, that f(x) be orthogonal to the solution 


do(x) = {(a peer a ta Lene t (18.146) 


of the homogenous adjoint problem, this being the f(a) = 0 case of the a > 0 
problem that we have just solved. 
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18.5 Further exercises and problems 


Exercise 18.18: Contour Integration: Use the calculus of residues to evaluate 
the following integrals: 


20 
do 

= ——————_—_{ 0<b : 

; i (a + bcos 0)?’ aes 


20 2 
cos* 36 
Ip = —_— 48, O<a<l. 
. } 1 — 2acos 26 4+ a? . 


CO eo 
— —— se; -l<a<2. 
’ [ G+222"" . 


These are not meant to be easy! You will have to dig for the residues. 


Answers: 
27a 
ib, = (a2 — §2)3/2” 
na? +1) 2m(1—a+ta?) 
= a a 
az#—1 a—1 
m(1— a) 
Cos 
4cos(ma/2) 


Exercise 18.19: By considering the integral of 
f(z) = In(1 — e”**) = In(—2ie* sin z) 
around the indented rectangle 


iY T+iY 


' 


Figure 18.12: Indented rectangle. 
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with vertices 0, 7, 7+7Y, iY, and letting Y become large, evaluate the integral 


fey In(sin x) da. 
0 


Explain how the fact that «Ine — 0 as € — 0 allows us to ignore contributions 
from the small indentations. You should also provide justification for any other 
discarded contributions. Take care to make consistent choices of the branch of 
the logarithm, especially if expanding In(—2ie” sinx) = iz +In2+In(sina) + 
In(—1). 

(Ans: —7 In 2.) 


Exercise 18.20: By integrating a suitable function around the quadrant con- 
taining the point z) = e’"/4, evaluate the integral 


lee) gol 
(a) = | —jda Cae 
0 1l+z2 


It should only be necessary to consider the residue at zo. 
(Ans: (7/4)cosec (7a/4).) 


Exercise 18.21: In section 5.5.1 we considered the causal Green function for 
the damped harmonic oscillator 


te sin(Nt), t>0 
G(t) — Ge sin ; : 
(t) {2 ealp 


and showed that its Fourier transform 


et t= ~——_,, 18.14 
[SM 008= aa raP en 


had no singularities in the upper half-plane. Use Jordan’s lemma to compute 
the inverse Fourier transform 


1 oo eiwt 
eee 
2 Jigg 2? — (w+ iy)? 
and verify that it reproduces G(t). 


Problem 18.22: Jordan’s Lemma and one-dimensional scattering theory. In 
problem 4.13 we considered the one-dimensional scattering problem solutions 


et 4 rr(kje**, 2 ek 
= ; > kk > 0. 
ve(7) { tr (k)e**, ceR, - 
tr(k)et** rel, k 
= : y : 0. 
{ eft 4 rp(kje**, 2 ER. o 
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and claimed that the bound-state contributions to the completeness relation 
were given in terms of the reflection and transmission coefficients as 


°° dk i 
S° Win (2)Un(a') = -{ atEihye ee, x, x’ € L, 
bound oe 
© dk —ik(a—2') P 
——e pais ik(a—a i 
[= (k)e cel, x ER, 
= a) Baliye *O-), eR, a EL, 
ag 20 
_ © dk —ik(x+2’) ! 
= i ra (k)e c,x ER. 


The eigenfunctions 


and 


ip ant JitRthe?: ze L, 
Wy (x) = 1a +rr(kje**, ve R. 


are initially refined for k real and positive (yr) or for k real and negative 
Ww», but they separately have analytic continuations to all of k € C. The 
reflection and transmission coefficients rp p(k) and tz,r(k) are also analytic 
functions of k, and obey rz,r(k) = ry, p(—k*), trr(k) = th p(-k*). 


a) By inspecting the formule for yo (x), show that the bound states (x), 
with E, = —k?, are proportional to yn (x) evaluated at points k = iky, 
on the positive imaginary axis at which rz(k) and t;(k) simultaneously 
have poles. Similarly show that these same bound states are proportional 
to wh (a) evaluated at points —ik,, on the negative imaginary axis at 
which rp(k) and tr(k) have poles. (All these functions vb) (2), rr(k), 
tr.p(k), may have branch points and other singularities in the half-plane 
on the opposite side of the real axis from the bound-state poles.) 

b) Use Jordan’s lemma to evaluate the Fourier transforms given above in 
terms of the position and residues of the bound-state poles. Confirm 
that your answers are of the form 


Se A* [sgn(x)Je~*"!*!A, [sen(a’ Je", 


as you would expect for the bound-state contribution to the completeness 
relation. 
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Exercise 18.23: Lattice Matsubara sums: Let wy, = exp{im(2n + 1)/N}, for 
n=0,...,N—1, be the N-th roots of (—1). Show that, for suitable analytic 
functions f(z), the sum 


1 
S= 5 Dd Fen) 
n=0 
can be written as an integral 
1 dz 2X 
S=— F (2) 


~ Onilo z N+1 


Here C consists of a pair of oppositely oriented concentric circles. The annulus 
formed by the circles should include all the roots of (-1), but exclude all 
singularites of f. Use this result to show that, for N even, 

Lae sinh E i NE 


a7 > = — tanh — 
N =O sinh? E + sin? Gntue cosh & 2 


Let N — co while scaling EF — 0 in some suitable manner, and hence show 
that 


(oe) 


a _ h a 
De: ae+(Qntiae 2 2° 
(Hint: If you are careless, you will end up differing by a factor of two from this 
last formula. There are two regions in the finite sum that tend to the infinite 
sum in the large N limit.) 


Problem 18.24: If we define x(h) = e°”¢(x), and F(x) = e°* f(x), then the 
Wiener-Hopf equation 


ee / * e-e-ul-ale-D gy) dy = f(a), @ > 0. 


becomes oF 
x(x) — af e -Uy(y)dy= F(x), x>0, 
0 


all mention of a having disappeared! Why then does our answer, worked out 
in such detail, in section 18.4.2 depend on the parameter a? Show that if a 
small enough that a + a is positive and a — a is negative, then (x) really is 
independent of a. (Hint: What tacit assumptions about function spaces does 
our use of Fourier transforms entail? How does the inverse Fourier transform 
of [(k + ia)? + a?]~! vary with a?) 
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Chapter 19 


Special Functions and Complex 
Variables 


In this chapter we will apply complex analytic methods so as to obtain a 
wider view of some of the special functions of mathematical physics than can 
be obtained from the real axis. The standard text in this field remains the 
venerable Course of Modern Analysis of E. T. Whittaker and G. N. Watson. 


19.1 The Gamma function 


We begin by examining how Euler’s “Gamma function” ['(z) behaves when 
z is allowed to become complex. You probably have some acquaintance with 
this creature. The usual definition is 


T(z) = t—'e'dt, Rez>0, (definition A). (19.1) 
0 


The restriction on the real part of z is necessary to make the integral converge. 
We can, however, analytically continue [(z) to a meromorphic function on 
all of C. An integration by parts, based on 


d 
a (fe =a eae (19.2) 


shows that be ss 
[here ee | tte dt — | te dt. (19.3) 
0 0 


793 
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The integrated out part vanishes at both limits, provided the real part of z 
is greater than zero. Thus we obtain the recurrence relation 


T(z+1) = 2 (2). (19.4) 
Since ['(1) = 1, we deduce that 
P= (ead ee Bee: (19.5) 


We can use the recurrence relation (19.4) to extend the definition of [(z) to 
the left half-plane, where the real part of z is negative. Choosing an integer 
n such that the real part of z+ n is positive, we write 


Phe - [(z+n) 


ea Gea (19.6) 


We see that the extended ['(z) has poles at zero, and at the negative integers. 
The residue of the pole at z = —n is (—1)"/n!. 

We can also view the analytic continuation as an example of Taylor series 
subtraction. Let us recall how this works. Suppose that —1 < Rex < 0. 
Then, from 
“(e*) =a ee (19.7) 


we have a S 
eee =a aretet— | aries, (19.8) 


Here we have cut off the integral at the lower limit so as to avoid the di- 
vergence near t = 0. Evaluating the left-hand side and dividing by x we 


find j es ; ee 
——@ = if dT te = =| diive™ (19.9) 
x € x € 
Since, for this range of x, 
i x - xz—1 
Seah dee, (19.10) 
x € 


we can rewrite (19.9) as 


1 (oe) (oe) 
-| atte t= | dtt®"(e*—1). (19.11) 
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The integral on the right-hand side of this last expression is convergent as 
€ — 0, so we may safely take the limit and find 


T(z +1)= i dtt™-'(e*-1). (19.12) 


Since the left-hand side is equal to ['(), we have shown that 
Ba | dtt™'(e'-1), -1<Rer<0. (19.13) 
0 
Similarly, if —2 < Rex < —1, we can show that 


T(z) = ie dtt®" (e* —1+t). (19.14) 


Thus the analytic continuation of the original integral is given by a new 
integral in which we have subtracted exactly as many terms from the Taylor 
expansion of e~ as are needed to just make the integral convergent at the 
lower limit. 

Other useful identities, usually proved by elementary real-variable meth- 
ods, include Euler’s “Beta function” identity, 


B(a,b) = ———~~ = [o — tte at (19.15) 


(which, as the Veneziano formula, was the original inspiration for string 
theory) and 
T(z) — z) = mcosec 7z. (19.16) 


The proofs of both formulze begin in the same way: set t = y’, x7, so that 


a ye lo-y a. geo l ee? Ase 
= af pe 2? 1 ye? 1 dx dy 


, am /2 
a ee ae ») | sin??-! 9 cos?’-! 6 dé. 
0 0 


D'(a)P(6) 


We have appealed to Fubini’s theorem twice: once to turn a product of 
integrals into a double integral, and once (after setting x = rcosé, y = 
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r sin @) to turn the double integral back into a product of decoupled integrals. 
In the second factor of the third line we can now change variables to t = sin? 6 
and obtain the Beta function identity. If, on the other hand, we put a = 1—z, 
b= z, we have 


ore) m/2 a /2 
(2) -z) = 2 | e” d(r?) | cot?! 6 d0 = 2 | cot”?! 6 dé. 
0 0 0 
(19.17) 
Now set cot @ = ¢. The last integral then becomes (see exercise 18.1): 


oo Cie 
o Gord 
establishing the claimed result. Although this last integral has a restriction 
on the range of z (19.16) it holds for all z by analytic continuation. If we 
put z = 1/2, we find that (['(1/2))? = 7. Because the definition A integral 


for [(1/2) is manifestly positive, the positive square root is the correct one, 
and 


dG =mcosecnz, 0<z<1, (19.18) 


(1/2) = «fn. (19.19) 
The integral in definition A for [(z) is only convergent for Rez > 0. A 
more powerful definition, involving an integral that converges for all z, is 


een [ dt, (definition B) (19.20) 
FA on jar” efinition B). ; 
Im(t) 
A 
C 


Figure 19.1: Definition “B” contour for ['(z). 


Here, C' is a contour originating at z = —oo — ie, below the negative real 
axis (on which a cut serves to make t~* single valued) rounding the origin, 
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and then heading back to z = —oo + 7e, this time staying above the cut. We 
take argt to be +7 immediately above the cut, and —7 immediately below 
it. This new definition is due to Hankel. 

For z an integer, the cut is unnecessary and we can replace the contour 
by a circle about z = 0 and so find 

1 1 1 19.21 

Thus, definitions A and B agree on the integers. It is less obvious that they 
agree for all z. A hint that this is true stems from integrating by parts 


1 1 et eee a i ae 1 
T(z) 2mi [(z2-Dt]_  (@ - Dam Jot! (2 - 1 (2-1) 
(19.22) 
The integrated-out part vanishes because e* is zero at —oo. Thus the “new” 
gamma function obeys the same functional relation as the “old” one. 
To show that the equivalence holds for non-integer z we will examine the 
definition-B expression for (1 — z): 
1 1 id) 
— i i ae. 19.23 
Td —2) ma ae?) 
We will assume initially that Rez > 0, so that there is no contribution to 


the integral from the small circle about the origin. We can therefore focus 
on contribution from the discontinuity across the cut, which is 


1 1 aad ie ee [ ae 
——_—_- = — t** dt = ———(2 —1 i dt 
Ti —2z) Oni Jeo ey ee 

1 (oe) 
= ~sinnz [ Pte dt. (19.24) 
n 0 


The proof is then completed by using P(z)['(1 — z) = mcosec mz, which we 
proved using definition A, to show that, under definition A, the right hand 
side is indeed equal to 1/I'(1 — z). We now use the uniqueness of analytic 
continuation, noting that if two analytic functions agree on the region Re z > 
0, then they agree everywhere. 


Infinite product for I'(z) 


The function F(z) has poles at z = 0,—1,—2,..., and therefore (zI'(z))~* = 
([(z+1))* has zeros as z = —1,—2,.... Furthermore, the integral in “def- 
inition B” converges for all z, and so 1/I'(z) has no singularities in the finite 
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z plane i.e. it is an entire function. Thus means that we can use the infinite 
product formula from section 18.3.2: 


g(z) = g(0)e* II { (1 a =) ets ; (19.25) 


j=l 


We need to recall the definition of Euler-Mascheroni constant y = —I'’(1) = 
0.5772157..., and that [(1) =1. Then 


eT De onan 


We can use this formula to compute 


ors eerereey ee 


3 
lI 
ma 


I 

XR 
le 
a 

_ 

| 
3,| Xo 
Sc 


and so obtain another (but not really independent) demonstration that 
I(z)C(1 — z) = acosec rz. 


Exercise 19.1: Starting from the infinite product formula for T(z), show that 


# T(z) 3 : 
—~ In = — 
dz? ‘ <= (2 +n)? 


Compare this “half series” with the expansion 
Pp 


(oe) 


1 
n’cosec?1z = S> Game” 


n=—Co 


19.2 Linear differential equations 


When linear differential equations have coeffecients that are meromorphic 
functions, their solutions can be extended off the real line and into the com- 
plex plane. The broader horizon then allows us to see much more of their 
structure. 
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19.2.1 Monodromy 


Consider the linear differential equation 


Ly = y" + p(z)y’ + a(z)y = 0, (19.27) 


where p and q are meromorphic. Recall that the point z = a is a regular 
singular point of the equation if p or q is singular there but 


(z—a)p(z), (z—4)’q(z) (19.28) 


are both analytic at z = a. We know, from the explicit construction of power 
series solutions, that near a regular singular point y is a sum of functions of 
the form y = (z —a)*y(z) or y = (z —a)*(In(z — a) y(z) + x(z)), where both 
y(z) and y(z) are analytic near z = a. We now examine this fact from a 
more topological perspective. 

Suppose that y; and yz are linearly independent solutions of Ly = 0. Start 
from some ordinary (non-singular) point of the equation and analytically 
continue the solutions round the singularity at z = a and back to the starting 
point. The continued functions y, and gy will not in general coincide with 
the original solutions but, being still solutions of the equation, they must be 
linear combinations of them. Therefore 


1 _ { M1 412 Y1 (19.29) 

Yo a21 422 yo J’ 
for some constants a;;. By a suitable redefinition of y;2 we may either diag- 
onalize this monodromy matrix to find 


(i) 5 @ a ) (19.30) 


or, if the eigenvalues coincide and the matrix is not diagonalizable, reduce it 


to a Jordan form 
YN A Y1 
iS = : 19.31 
2) (i J i: ae 


These matrix equations are satisfied, in the diagonalizable case, by functions 
of the form 


yi =(2- a) pilz), Yo = (z— a) polz), (19.32) 
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where A; = e?"*, and y;(z) is single valued near z = a. In the Jordan-form 
case we must have 


n= (2—0)" Jes) + xy Ie = apa), 42 = (2—a)"2(2), (19.33) 


27% 
where, again, the y,(z) are single valued. Notice that coincidence of the 
monodromy eigenvalues A; and A» does not require the exponents a; and 
a2 to be the same, only that they differ by an integer. This is the same 
Frobenius condition that signals the presence of a logarithm in the traditional 
series solution. 

The occurrence of fractional powers and logarithms in solutions near a 
regular singular point is therefore quite natural. 


19.2.2 Hypergeometric functions 


Most of the special functions of mathematical physics are special cases of the 
hypergeometric function F’(a, b; c; z), which may be defined by the series 


ab = a(a+1)b(b+1) , 


F(a, 6; c; z) = Pie > Olek. * + 
a(a+1)(a+ 2)b(b+ 1)(6+2) 5 
viele) ot 


_ I(c) l(a+n)(b+n) , 
~ T(a)P() D Tet nli4 n)- ee) 


For general values of a,b,c, this series converges for |z| < 1, the singularity 
restricting the convergence being a branch point at z = 1. 
Examples: 


(1+2)” = F(—n,b};6;—z), (19.35) 
In(Lt+z) = zF(1,1;2;—2), (19.36) 
113 
zg ‘sin! z F (5. 535: | : (19.37) 
e* im F(1, b; 1/0; z/b), (19.38) 
dae 
P,,(2) F (-n n+131; +) 7 (19.39) 
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where in the last line P, is the Legendre polynomial. 

For future reference, note that expanding the integrand on the right hand 
side as a power series in z and integrating term by term shows that F’(a, b; c; z) 
has the integral representation 


EQa0 G22) = Mo) D fo — tz) 411 — #1. (19.40) 


T(b)P(e — 
If Rec > Re(a +b), we may set z = 1 in this integral to get 


I(c)P'(c — a — Bb) 


Rasbse1)= Meaare=by 


(19.41) 


The hypergeometric function is a solution of the second-order differential 
equation 
2(1—z)y" + [e- (a+b6+1)z]y’ — aby =0. (19.42) 


This equation has regular singular points at z = 0, 1,00. Provided that 1—c 
is not an integer, the general solution is 


y = AF (a, bs c7z) + Bz’ °F (6—c+1,¢—-¢e4+1;2—c 2). (19.43) 


A differential equation possessing only regular singular points is known as a 
Fuchsian equation. The hypergeometric equation is a particular case of the 
general Fuchsian equation with three! regular singularities at z = 21, z2, 23. 
This equation is 

y’ + P(z)y’ + Q(z)y = 0, (19.44) 


'The Fuchsian equation with two regular singularities is 


y” + p(z)y’ + a(z)y =0 


with 


Its general solution is 
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where 
bag. e668). Lae 
Pe = (eae eee et) 
Bey z— 29 z— £3 
1 
Q(z) = ———-. x 


(z — 21)(z — 22)(z — 23) 
(S = ie 23) QQ ms (2 — ‘ a 21) 66 ds (z3 — a am Zz) VY ) 


(19.45) 


The parameters are subject to the constrainta+@+y+a'+ 6'+7 =1, 
which ensures that z = oo is not a singular point of the equation. This 
equation is sometimes called Riemann’s P-equation. The P probably stands 
for Papperitz, who discovered it. 

The indicial equation relative to the regular singular point at z, is 


r(r—1)+ (l-a-—a)r+aa’‘ =0, (19.46) 


and has roots r = a,a’. From this, we deduce that Riemann’s equation 
has solutions that behave like (z — z1)* and (z — ~)° near 2. Similarly, 
there are solutions that behave like (z — z2)° and (z— z2)® near zg, and like 
(z — 23)” and (z— 3)” near 23. The solution space of Riemann’s equation is 
traditionally denoted by the Riemann “P” symbol 


ZL 22 23 
Peg 8 Ap 8 (19.47) 
a’ iB “/ 


where the six quantities a, 3,y,a’, 0’, y', are called the exponents of the so- 
lution. A particular solution is 


y= (=) (2) Fle + B+y7,a4 gi Lay 4 oo Ra )). 


z— £9 


By permuting the triples (z1,a,a’), (22,3, 8’), (23,7,7’), and within them 
interchanging the pairs a < a’, y <= y’, we may find a total? of 6 x 4 = 24 
solutions of this form. They are called the Kummer solutions. Only two of 


?The interchange 3 < 3’ leaves the hypergeometric function invariant, and so does not 
give a new solution. 


19.2. LINEAR DIFFERENTIAL EQUATIONS 803 


them can be linearly independent, and a large part of the theory of special 
functions is devoted to obtaining the linear relations between them. 
It is straightforward, but a trifle tedious, to show that 


41 22 3 41 22 23 
(2z—21)"(z—-z2)*(z-23)"P 4 a B y z2>=Pe atr B+s ytt z 
a’ iB n/ a+r Bi+s +t 
(19.49) 


provided r+ s+t = 0. Riemann’s equation retains its form under Mobius 
maps, only the location of the singular points changing. We therefore deduce 
that 


2. 2g 23 2% 2% 2B 
Peg GB Ae ee a Pan OB ae Be (19.50) 
Q’ fou 7! a! ion n/ 
where 
Geb , a+b , a2+b , G%g+b 
— = ——_,, La. = = Le S——; 19.51 
een oe “A cz +d Se eo | od ( ) 


By using the Mobius map that takes (21, z2, 23) — (0,1, 00), and by ex- 
tracting powers to shift the exponents, we can reduce the general eight- 
parameter Riemann equation to the three-parameter hypergeometric equa- 
tion. 

The P symbol for the hypergeometric equation is 


0 ee) 1 
Pi 0 a 0 zs. (19.52) 
l-c b ec-a-—bd 


By using this observation and a suitable Mobius map we see that 
F(a,b;a+b—c,1—-z) 


and 

(=z)? F(e=be-—ae—a—b 4112) 
are also solutions of the hypergeometric equation, each having a pure (as 
opposed to a linear combination of) power-law behaviours near z = 1. (The 
previous solutions had pure power-law behaviours near z=0.) These new 
solutions must be linear combinations of the old ones, and we may use 


I(c)P(c-—a—b) 


BE agbeesl) = T(c—a)i(e—6) 


Re(c—a—b)>0, (19.53) 
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together with the trick of substituting z = 0 and z = 1, to determine the 
coefficients and show that 
I(c)[' (ce — a — Bb) 
I(c — a)I'(c — b) 
I(c)P(a + b—c) 
D(a)P(6) 


F(a,6;¢,2) = P(G,b;@--0= 6,15 2) 


(isa)? (eb eae a0 Bs): 

(19.54) 
This last equation holds for all values of a, 6, c such that the Gamma functions 
make sense. 


A complete set of pure-power solutions to the hypergeometric equation 
can be taken to be 


0'(2) = Fla,be;2), 

(z) z°F(at+1—cb+1—c¢2-62), 

Oz) = F(a,b;sl—ct+atb1—2), 

Mz) = (l—z)**?F(c—a,c—b1+c—a—b;1—-2), 

d9(z) = 2° F(a,at1l—clta—b;2z7}), 

dY(z) = 2 °F(ab+1—c¢1—a4+;27}). (19.55) 


Here the suffix denotes the point at which the solution has pure-power be- 
haviour. The connection coefficients relating the solutions to one another are 
then 


go 7 T(c)P(c -a- b) (0) a T(c)P(a +b— C) go 
T'(e—a)'(c—b)** T'(a)P(b) ee 
gO = T(2—c)['(c—a ) 4 T(2—c)I'(a+b—-c) gf 
~ RSet) ** Pat Poerol be)? ? 
(19.56) 
and 
(0) _ inal (c)P(b _ a) (0) —inb P(2 = c)T(a = b) (1) 
oo = 8 Te-are? + * Teti-ord—p*’ 
gw — -in(atl—c) P(2 — c)P(b— a) mao) 4 ev in(bt+1-c) P(2 — c)P(a — 8) 9g, 


T(a+1—or(—b) 
(19.57) 


oslo =a) 
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These relations assume that Imz > 0. The signs in the exponential factors 
must be reversed when Im z < 0. 

Example: The Poschel-Teller problem for general positive |. A substitution 
z = (1+e?")~ shows that the Péschel-Teller Schrédinger equation 


( oe I(i+ 1)sech?) wy = Ew (19.58) 


dx? 


has solution 


1 
v(2) = (14+ Same al + me ee (: +i+1«x—-i«+1; ss : 


1 + e2 
(19.59) 
where E = —x?. This solution behaves near x = 00 as 
yre™F(K+l14+1,4—-1;6+;0) =e ™. (19.60) 


We use the connection formula (19.54) to see that it behaves in the vicinity 
of x = —oo as 

yom eo F(R +14 1a be + i1—e™) 
T(« + 1)D(—k) en «= PE (K +1) (kK) 
+ oe 8 
h(-) (14+ 2) Rw +tl+(«-J) 


To find the bound-state spectrum, assume that & is positive. Then 


KX 


(19.61) 


E = —k? will be an eigenvalue provided that coefficient of e~*” near x = —o0o 
vanishes. In other words, the spectrum follows from the condition 
T(« + 1) (4) 
—_—_—___—_——_ = 0. 19.62 
TDawetl+r(«-J) ( ) 
This condition is satisfied for a finite set k,n = 1,..., |1| (where [1] denotes 
the integer part of J) at which x is positive but « — / is zero or a negative 
integer. 
By setting « = —?k, for real k we find the scattering solution 
etka =e ruRje. x <0: 
— : ; 19. 
w= | Sayer wee (19.63) 
where 
Hie = Dd +1 —ik)T(—ik — DI (tk) 
7 T(-O)P(1 + DI (ik) 
_ _sinal Td +1 -—ik)T(-ik — DE), (19.64) 


1 I\(—ik) 
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and 
7 Ti+ 1 —ik)l(-ik — 1) 


tik) = 
(K) (1 — tk)P(—7k) 
Whenever / is a (positive) integer, the divergent factor of '(—/) in the denom- 
inator of r(k&) causes the the reflected wave to vanish. This is something we 
had observed in earlier chapters. In this reflectionless case, the transmission 
coefficient t(k) reduces to a phase 
(—ik + 1)(—ék + 2)---(-tk +0) 


Oe ar ee ee a (19.66) 


(19.65) 


19.3 Solving ODE’s via contour integrals 


Our task in this section is to understand the origin of contour integral solu- 
tions, such as the expression 


ioe 5 / (ts) —yetae, (19.67) 


Tb) (ce — 


which we have previously seen for the hypergeometric equation. 
Suppose that we are given a differential operator 


L, = 82, + p(z)d, + ¢(z) (19.68) 


and seek a solution of L,u = 0 as an integral 


= i F(z,t) dt (19.69) 


over some contour [. If we can find a kernel F’ such that 


dQ 
2, = TAT 1 t 
gk ey (19.70) 


for some function Q(z, t), then 


bu = f LPlet w= [ (FZ) dt = [Qlp. (19.71) 


Thus, if Q vanishes at both ends of the contour, if it takes the same value at 
the two ends, or if the contour is closed and thus has no ends, then we have 
succeeded in our quest. 
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Example: Consider Legendre’s equation, 


2 
Luw=(1- ys = 22a +v(v + 1)u=0. (19.72) 


1 {PO} ven e{o ot (19.73) 


P,(z) = ali} dt (19.74) 


will be a solution of Legendre’s equation, provided that 


We could, for example, take a contour that encircles the points t = z and t = 
1, but excludes the point t = —1. On going round this contour, the numerator 
acquires a phase of e?"”+)), while the denominator of [Q]p acquires a phase 
of e27’+2)_ The net phase change is therefore e~?™' = 1. The function in the 
integrated-out part is therefore single valued, and so the integrated-out part 
vanishes. When v is an integer, Cauchy’s formula shows that (19.74) reduces 
to 


The identity 


shows that 


(19.75) 


1 a” 
ee 


which is Rodriguez’ formula for the Legendre polynomials. 


ie =1)*, (19.76) 


i t) 


°Z 
GSD >Re(t) 


Figure 19.2: Figure-of-eight contour for Q,(z). 


The figure-of-eight contour shown in figure 19.2 gives us a second solution 


Crlz) = — | {~—} dt, vé€Z. (19.77) 
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Here we define arg(t — 1) and arg(t + 1) to be zero for t > 1 and t > 
—1 respectively. The integrated-out part vanishes because the phase gained 
by the (¢? — 1)”*! in the numerator of [Q]r during the clockwise winding 
about t = 1 is undone during the anti-clockwise winding about t = —1, and, 
provided that z lies outside the contour, there is no phase change in the 
(z —t)+?) in the denominator. 

When v is real and positive the contributions from the circular arcs sur- 
rounding t = +1 become negligeable as we shrink this new contour down onto 
the real axis. We observe that, with the arguments of (t + 1) as specified 
above, 

(t? _ 1)? a (1 — tte 


for the left-going part of the contour on the real axis between t = +1 and 
t = —1, and 


eo = Ris _< (1 => Cremer = (1 _ i 


after we have rounded the branch point at t = —1 and are returning along 
the real axis to t = +1. Thus, after the shrinking manceuvre, the integral 
(19.77) becomes 


Q,(z) = af i] a, ee. (19.78) 


In contrast to (19.77), this last formula continues to make sense when v is 
a positive integer. It then provides a convenient definition of Q,,(z), the 
Legendre function of the second kind (See exercise 18.3). 

It is usually hard to find a suitable F(z, t) in one fell swoop. (The identity 
(19.73) exploited in the example is not exactly obvious!) An easier strategy 
is to seek solution in the form of an integral operator with kernel a K acting 
on function v(t). Thus we try 


ulZ) = [Ke t)u(t) dt. (19.79) 


Suppose that L, K(z,t) = M, K(z,t), where M; is differential operator in t 
that does not involve z. The operator M; will have have a formal adjoint M} 
such that 


, v(M,K) dt — if K(Mtv) dt = [Q(K,0)}, (19.80) 


19.3. SOLVING ODE’S VIA CONTOUR INTEGRALS 809 
(This is Lagrange’s identity.) Now 


Esti = [EKe.twat 
EF 


[oaKe.ne dt 
= [ Kenute) dt + (Q(K,v)]p. 


We can therefore solve the original equation, L,u = 0, by finding a v such 
that (M/v) = 0, and a contour with endpoints such that [Q(K,v)]p = 0. 
This may sound complicated, but an artful choice of K can make it much 
simpler than solving the original problem. A single K will often work for 
families of related equations. 
Example: We will solve 

d7u— du 
by using the kernel K(z,t) = e~*'. It is easy to check that L,K(z,t) = 
M,K(z,t) where 


O 
M,=- rn (19.82) 
and so 5 5 
MP=@+2t+v=¢ 1) +t. 19.83 
t tatty a i i ( ) 
The equation M}o = (0 has a solution 
v(t) =t De®, (19.84) 
and so 
u= | pO) ett at) gy (19.85) 
r 


for some suitable I. 


19.3.1 Bessel functions 


As an illustration of the general method we will explore the theory of Bessel 
functions. Bessel functions are members of the family of confluent hypergeo- 
metric functions, obtained by letting the two regular singular points z2, 23 of 
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the Riemann-Papperitz equation coalesce at infinity. The resulting singular 
point is no longer regular, and confluent hypergeometric functions have an 
essential singularity at infinity. The confluent hypergeometric equation is 


zy” +(c— z)y’ — ay =0, (19.86) 
with solution 
Tc) < T(a+n) 
® = n 19. 
iG 2) = 75) d. Pern rr ai ey) 
Observe that 
Dl ajcz)-= iim F(a, b; c; z/b). (19.88) 


The second solution, provided that c is not an integer, is 
z-*@(a—c+1,2—c2). (19.89) 


Other functions of this family are the parabolic cylinder functions, which 
in special cases reduce to e~*/4 times the Hermite polynomials, the error 


function, 
7 1 3 
ert (Z) = i e dt = 2® (=. =; -2) (19.90) 
0 


and the Laguerre polynomials , 


in, ered) ae 2 
of = Tara ecm the) (19.91) 


Bessel’s equation involves the operator 


2 


1 2 
L, = 02, +-0.+ (1 = =) (19.92) 


Experience shows that for Bessel functions a useful kernel is 


K(z,t) = (5) exp (: = =) (19.93) 


Then 


) K(z,t) (19.94) 
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so, again, M is a first order operator, which is simpler to deal with than the 
original second order L,. In this case the adjoint is 


1 
Mt = (-a - — ) (19.95) 
and we need a v such that 
1 
Mty =— (a & “ ) v=0. (19.96) 


Clearly, v = ¢~”~! will work. The integrated out part is 
z\7)° 
(Q(K, ft = [rte (1-5) (19.97) 


and we see that 
1 NE ge (t-3) 
Jz) == (5) fete) at, 19.98 
2) 271 \2 | - ( ) 
solves Bessel’s equation provided we use a suitable contour. 
We can take for I a curve C’ starting at —oo — ze and ending at —oo + 7e, 


and surrounding the branch cut of t-’~', which we take as the negative t 
axis. 


Im(t) 
A 


Figure 19.3: Contour for solving Bessel equation. 


This contour works because Q is zero at both ends of the contour. 
A cosmetic rewrite t = uz/2 gives 


dz) = = [rome du. (19.99) 
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For v an integer, there is no discontinuity across the cut, so we can ignore it 
and take C' to be the unit circle. Then, recognizing the resulting 
1 —n-1 5(u-+) 
tes u e2\"~u) du. (19.100) 
272 |z|=1 
to be a Laurent coefficient, we obtain the familiar Bessel-function generating 
function 
eft) = SU, (z)u". (19.101) 
When v is not an integer, we see why we need a branch cut integral. 
If we set u = e” we get 


1 . 
J(z) = a [ ae (19.102) 


where C” starts goes from (oo — im) to —im to +im to (co + iz). 


Im(w) 
+17 re 


Re(w) 


Figure 19.4: Bessel contour after the changes of variables. 


If we set w = t +77 on the horizontals and w = 70 on the vertical part, 
we can rewrite this as 


sin v7 


1 Tv 
Dies -{ cos(v@ — zsin 0) dé — 
0 


7 


- err. (10108) 
a 0 
All these are standard formule for the Bessel function but their origins would 
be hard to understand without the contour solutions trick. 

When v becomes an integer, the functions J,(z) and J_,(z) are no longer 
independent. In order to have a Bessel-equation solution that retains its 
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independence from J,(z), even as vy becomes a whole number, we define the 
Neumann function by 


def J,(z) cosym — J_,(z) 


N, 
(2) sin v7 
t Tv Tv 
= ae i cos(v@ — z sin 6) dO — cosec ven | cos(v@ + z sin @) dé 
T 0 0 
__ COS nT | e vt—zsinht dt — - | e’t—zsinht dt. (19.104) 
1 0 ™ Jo 
| 
Pc isa alatbie ep aieleaiensnahe cima at 2 ere _ 
AY) 
A 
H®) 
side Bis er glad aan de cate kee rit) 7 


Figure 19.5: Contours defining H(z) and H(z). 


Both the Bessel and Neumann functions are real for positive real x. As 
x becomes large they oscillate as slowly decaying sines and cosines. It is 
sometimes convenient to decompose these real functions into solutions that 


oscillate as e*’*. We therefore define the Hankel functions by 
1 oo+in : 
Hz) = — pe a, larg z| < 7/2 
a 
(2 1 — zsinh w—vw 
AY (z) = -— e dw, larg z| < 7/2. (19.105) 
i 


Then 
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5 (HO (2) - H®(z)) = N,(2). (19.106) 


19.4 Asymptotic expansions 


We often need the understand the behaviour of solutions of differential equa- 
tions and functions, such as J,(xz), when x takes values that are very large, 
or very small. This is the subject of asymptotics. 

As an introduction to this art, consider the function 


Z(A) = / ere dee: (19.107) 


Those of you who have taken a course quantum field theory based on path 
integrals will recognize that this is a “toy,” 0-dimensional, version of the path 
integral for the Ay* model of a self-interacting scalar field. Suppose we wish 
to obtain the perturbation expansion for Z(A) as a power series in A. We 
naturally proceed as follows 


Z(A) = ‘i ee ds 
oo asd n,4n 
= / (1 ae 
= n=0 nl 
£ = woe "a —a? Tr 
= Ses f ea de 
n=0 = 
(oe) er 
~ dU)" Pan + 1/2). (19.108) 
n=0 : 


Something has clearly gone wrong here! The gamma function ['(2n + 1/2) ~ 
(2n)! ~ 4"(n!)? overwhelms the n! in the denominator, so the radius of 
convergence of the final power series is zero. 

The invalid, but popular, manceuvre is the interchange of the order of 
performing the integral and the sum. This interchange cannot be justified 
because the sum inside the integral does not converge uniformly on the do- 
main of integration. Does this mean that the series is useless? It had better 
not! All quantum field theory (and most quantum mechanics) perturbation 
theory relies on versions of this manoeuvre. 
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We are saved to some (often adequate) degree because, while the inter- 
change of integral and sum does not lead to a convergent series, it does lead 
to a valid asymptotic expansion. We write 


(oe) 


Ped 
ZA) ~ 20) —T(2n + 1/2) (19.109) 
where 2 
Zr) ~ Sand” (19.110) 
n=0 
is shorthand for the more explicit 
N 
ZN) = Ga OO")... N 172, 3)40%: (19.111) 
n=0 


The “big O” notation, 


N 
Z(d) — D5 and” = O(N") (19.112) 
n=0 
as \ > 0, means that 
rea IZ) = Xo and"| Sh See (19.113) 
peer; ANF] 


The basic idea is that, given a convergent power series )>,, a," for the 
function f(A), we fix the value of \ and take more and more terms. The 
sum then gets closer to f(A). Given an asymptotic series, on the other hand, 
we select a fixed number of terms in the series and then make \ smaller and 
smaller. The graph of f(A) and the graph of our polynomial approximation 
then approach each other. The more terms we take, the sooner they get 
close, but for any non-zero \ we can never get exacty f(A)—no matter how 
many terms we take. 

We often consider asymptotic expansions where the independent variable 
becomes large. Here we have expansions in inverse powers of x: 


N 
Po bar Ole). Wa d8 seas (19.114) 


n=0 
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In this case 


N 
F(z) — S° bya = O (aN) (19.115) 
n=0 
means that 


hae ja-N-1 


lim (Bae | =K <o. (19.116) 


Again we take a fixed number of terms, and as x becomes large the function 
and its approximation get closer. 

Observations: 

i) Knowledge of the asymptotic expansion gives us useful knowledge about 
the function, but does not give us everything. In particular, two distinct 
functions may have the same asymptotic expansion. For example, for 
small positive \, the functions F(A) and F(A) +ae~ have exactly the 
same asymptotic expansions as series in positive powers of A. This is 
because e~"/* goes to zero faster than any power of \, and so its asymp- 
totic expansion 5°, a,A" has every coefficient a, being zero. Physicists 
commonly say that e~'/ is a non-perturbative function, meaning that 
it will not be visible to a perturbation expansion in powers of X. 

ii) An asymptotic expansion is usually valid only in a sector a < arg z < b 
in the complex plane. Different sectors have different expansions. This 
is called the Stokes’ phenomenon. 

The most useful methods for obtaining asymptotic expansions require 
that the function to be expanded be given in terms of an integral. This 
is the reason why we have stressed the contour-integral method of solving 
differential equations. If the integral can be approximated by a Gaussian, we 
are lead to the method of steepest descents. This technique is best explained 
by means of examples. 


19.4.1 Stirling’s approximation for n! 


We start from the integral representation of the Gamma: function 
T(x +1) =) er dt (19.117) 
0 
Set t = x¢, so 


Pe+1)=2)" i eI) dc, (19.118) 
0 
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where 
f(¢) =In¢ -¢. (19.119) 
We are going to be interested in evaluating this integral in the limit that 
x — oo and finding the first term in the asymptotic expansion of (a+ 1) in 
powers of 1/z. In this limit, the exponential will be dominated by the part 
of the integration region near the absolute maximum of f(¢). Now, f(¢) is 
a maximum at ¢ = 1 and 
1 

ee i ae Cae Oana (19.120) 

So 


T(e+1) = nee) Cam aaa 4 
0 


a+1 a gaeily dc 


20 
= gttle-z eae 
x 


= VQnatt Ve, (19.121) 


By keeping more of the terms represented by the dots, and expanding 
them as 


Ph at ae a (Sa [1+ ai(€ — 1) +.a2(¢ - 1)? +--+], (19.122) 
we would find, on doing the integral, that 


2 
8 
© 


1 1 139 571 1 
T lrv2 t+1/2,-@ | 4 _ ae ek ee 
a a a * Toe 7 28802 5184023 2488832024 * ( 5 
(19.123) 
Since (n+ 1) = n! we have the useful result 
1 
n! we [day rtt/2e—n c fs om AS 2c | . (19.124) 
n 


We make contact with our discusion of asymptotic series by rewriting the 
expansion as 
T(z +1) 1 1 139 571 


SS Se Se ee Se EL 
V Qrxrtl/2e-# %5 12 a 28822 5184023 =. 2488832024 e ( ) 


This is typical. We usually have to pull out a leading factor from the function 
whose asymptotic behaviour we are studying, before we are left with a plain 
asymptotic power series. 


=)) 
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19.4.2 Airy functions 


The Airy functions Ai(x) and Bi(z) are closely related to Bessel functions, 
and are named after the mathematician and astronomer George Biddell Airy. 
They occur widely in physics. We will investigate the behaviour of Ai(x) for 
large values of |x|. A more sophisticated treatment is needed for this problem, 
and we will meet with Stokes’ phenomenon. Airy’s differential equation is 


d?y 
—t — zy =0. (19.126) 
On the real axis Airy’s equation becomes 
da 
eT ha ply (19.127) 
ax? 


and we we can think of this as the Schrodinger equation for a particle running 
up a linear potential. A classical particle incident from the left with total 
energy E = 0 will come to rest at x = 0, and then retrace its path. The point 
x = Ois therefore called a classical turning point. The corresponding quantum 
wavefunction, Ai(z), contains a travelling wave incident from the left and 
becoming evanescent as it tunnels into the classically forbidden region, x > 0, 
together with a reflected wave returning to —oo. The sum of the incident 
and reflected waves is a real-valued standing wave. 


=10 =5 i) 10 
-0.2 
-0.4 


Figure 19.6: The Airy function Ai (2). 
We will look for contour integral solutions to Airy’s equation of the form 


ua) = fe (t) dt. (19.128) 
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Denoting the Airy differential operator by L, = 0? — x, we have 
: d 
Ley = fe —a)e" f(t) dt = / f(t) G — =} e™ dt. 
C a dt 
d 
= [-e"f@®], +f (f° + ai r0) e™ dt. (19.129) 
c 


Thus f(t) = e73" and hence 


y(z) = i: ert 3" dt, (19.130) 
Cc 


at—3t3 

. . . Cc 
vanishes. There are therefore three possible contours, which end at any two 
of 


provided the contour ends at points where the integrated-out term, le 


2ni/3 
’ 


—27i/3 


+o0o, Wwe ooe 


Figure 19.7: Contours providing solutions of Airy’s equation. 


Since the integrand is an entire function, the sum yo, + yc, + Yc; 1S zero, So 
only two of the three solutions are linearly independent. The Airy function 
itself is defined by 


1 1 = 1 
AGye— | es" d=— cos | rs + =s* ) ds. (19.131) 
C7 0 3 


py 
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In obtaining last equality, we have deformed the contour of integration, C, 
which ran from co e~27*/3 to oo e?**/3, so that it lies on the imaginary axis, 
and there we have written t = is. You may check (by extending Jordan’s 
lemma) that this deformation does not alter the value of the integral. 

To study the asymptotics of this function we need to examine separately 
two cases x >> 0 and x < 0. For both ranges of x, the principal contribution 
to the integral will come from the neighbourhood of the stationary points 
of f(t) = xt — t?/3. In the complex plane, stationary points are never pure 
maxima or minima of the real part of f (the real part alone determines 
the magnitude of the integrand) but are always saddle points. We must 
deform the contour so that on the integration path the stationary point is 
the highest point in a mountain pass. We must also ensure that everywhere 
on the contour the difference between f and its maximum value stays real. 
Because of the orthogonality of the real and imaginary part contours, this 
means that we must take a path of steepest descent from the pass — hence 
the name of the method. If we stray from the steepest descent path, the 
phase of the exponent will be changing. This means that the integrand will 
oscillate and we can no longer be sure that the result is dominated by the 
contributions near the saddle point. 


A 
A, v 


a) b) 
[ end oo re 
\ ae 


Figure 19.8: Steepest descent contours, location of the stationary points and 
orientation of the saddle passes for a) x >> 0, b)z <0. 


i) x >>0: The stationary points are at t = +,/x. Writing t = € — /x have 


fQO= -09 ye= se (19.132) 
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while near t = +,/x we write t = ¢ + \/x and find 


f(Q) = +508? — CvE- 50 (19.133) 


We see that the saddle point near —\/z is a local maximum when we 
route the contour vertically, while the saddle point near +,/Z is a local 
maximum as we go down the real axis. Since the contour in Ai(z) is 
aimed vertically we can distort it to pass through the saddle point near 
—,/x, but cannot find a route through the point at +./x without the 
integrand oscillating wildly. At the saddle point the exponent, x«t—t?/3, 
is real. If we write t = u+ iv we have 


Im (xt — t?/3) = o(2 — u? + v?/3), (19.134) 


so the exact steepest descent path, on which the imaginary part remains 
zero is given by the union of real axis (v = 0) and the curve 


i 
uw au =f. (19.135) 


This is a hyperbola, and the branch passing through the saddle point 
at —4/z is plotted in a). 
Now setting € = 7s, we find 


1 oe 1 
Ai (x) = woe’ | e Veet ds ~ aR tee. (19.136) 


(oe) 


ii) ¢ <0: The stationary points are now at +i,/|a|. Setting t = €+7,/|x| 
find that 


f(x) = Figlol? ie Via (19.137) 


The exponent is no longer real, but the imaginary part will be constant 
and the integrand non-oscillatory provided we deform the contour so 
that it becomes the disconnected pair of curves shown in b). The 
new contour passes through both saddle points and we must sum their 
contributions. Near t = i,/|z] we set € = e37/4s and get 


1 4 —a2\ni3/2 [oO _ 2 
eot/46 i3|2| e€ 4/ |x|s ds 
—oo 


272 


3ni/4).)—1/4,,—12 |2|9/? 
Bae ee 
1 
Ta He 
Qi/m 


e/a) | Aer aa 


(19.138) 
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Near t = —i,/|a|we set € = e™/4s and get 


1 onifa een? fe —Jlals? gg = 
1 


Oni 


= eer er” (19.139) 


The sum of these two contributions is 
; 1 of Zina, © 


The fruit of our labours is therefore 


1 2 3/2 1 
Ai(z) ~ aR te c +O (=) 2. < U; 


Vins ae] igo. 1 
——— = 1 : 
Talat sin (Fle +7) +o(2)], x <0 


(19.141) 


Suppose that we allow x to become complex « — z = |z/e”, with —7 < 
§ < 7m. Then figure 19.9 shows how the steepest contour evolves and leads 
the two quite different expansion for positive and negative x. We see that 
for 0 < 0 < 27/3 the steepest descent path continues to be routed through 
the single stationary point at —,/]zJe’/?. Once @ reaches 27/3, though, it 
passes through both stationary points. The contribution to the integral from 
the newly acquired stationary point is, however, exponentially smaller as 
|z| — oo than that of t = —,/|zJe”/?. The new term is therefore said to 
be subdominant, and makes an insignificant contribution to the asymptotic 
behaviour of Ai(z). The two saddle points only make contributions of the 
same magnitude when @ reaches 7. If we analytically continue beyond 0 = 7, 
the new saddle point will now dominate over the old, and only its contribtion 
is significant at large |z|. The Stokes line, at which we must change the form 
of the asymptotic expansion is therefore at 6 = 7. 

If we try to systematically keep higher order terms we will find, for the 
oscillating Ai(—z), a double series 


Ai(—z) ~ w¥?27-"4 | sin(p + 1/4) So(-1 )" Corp? 
n=0 


— cos(p + 7/4) So(-1 Wee kp- | (E9142) 
n=0 
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Figure 19.9: Evolution of the steepest-descent contour from passing through 
only one saddle point to passing through both. The dashed and solid lines are 
contours of the real and imaginary parts, repectively, of (zt—t?/3). 6 = Arg z 
takes the values a) 71/12, b) 57/8, c) 20/3, d) 37/4. 
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where p = 22°/?/3. In this case, therefore we need to extract two leading 
coefficients before we have asymptotic power series. 

The subject of asymptotics contains many subtleties, and the reader in 
search of a more detailed discussion is recommended to read Bender and 
Orszag’s Advanced Mathematical Methods for Scientists and Engineers. 


Exercise 19.2: Consider the behaviour of Bessel functions when z is large. By 
applying the method of steepest descent to the Hankel function contours show 


that 
2 
HO(x) ~ 4/2 eile-va/e-n/a [7-1 
Tx Sra 


2 
H®) (x) ~ [2 ,-ie—vn/2—n/4) c 4+ a eee toss | , 
TX 87x 


and hence 


19.5 Elliptic functions 


The subject of elliptic functions goes back to remarkable identities of Guilio 
Fagnano (1750) and Leonhard Euler (1761). Euler’s formula is 


le Vi dy a dz 
A) ye fe 19.143 
i JV1—<«4 0 V1-—yt 0 Vl— 24 ( ) 
where 0 < u,v < 1, and 
[TF Tau 
pens ee (19.144) 
1+ uv? 
This looks mysterious, but perhaps so does 
* ode ie dy [ dz 
ay eee eee pe 19.145 
i V1 — 2? 0 V1—-y? 0 Vl— 2? ( ) 


where 


r=uvl—v?+vvl—-v?, (19.146) 
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until you realize that the latter formula (19.146) is merely 
sin(a + b) = sinacosb+ cosasinb (19.147) 
in disguise. To see this set 
u=sina, v=sinb (19.148) 


and remember the integral formula for the inverse trigonometric sine function 


San (19.149) 


£L 

Cs ——.. 

: V1 — x? 
The Fagnano-Euler formula is a similarly-disguised addition formula for an 
elliptic function. Just as we use the substitution x = siny in the 1/V1— 2? 
integral, we can use an elliptic-function substitution to evaluate elliptic inte- 
grals which involve square-roots of quartic or cubic polynomials. Examples 
are 


J dt 
(7 EE 19.150 
i, t — a1)(t — az) (t — ag) (t — aa) 


7 dt 
a ———— 19.151 
/ (t — a,)(t — ag)(t — a3) ( ) 


Note that J, can be reduced to an integral of the form J; by using a Mobius- 
map substitution 


at’ +b dt’ 
Tpit *HUn = AED) re 2 


to send a4 to infinity. Indeed, we can use a suitable Mobius map to send any 
three of the four points a, to 0,1, 00. 

The idea of elliptic functions (as opposed to elliptic integrals, which are 
their functional inverses) was known to Gauss, but Abel and Jacobi were the 
first to publish (1827). For the developing the general theory, the simplest 
elliptic function is the Weierstrass yg. This is really a family of functions that 
is parametrized by a pair of linearly independent complex numbers or periods 
W1, Wg. For a given pair of periods, the o function is defined by the double 
sum 


1 1 1 
a ee ee Oc 


(19.152) 
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Helped by the counterterm, the sum is absolutely convergent, so we can 
rearrange the terms to prove double periodicity 


o(z + mw, +nwe) = e(z), m,n eZ. (19.154) 


The function is thus determined everywhere by its values in the period paral- 
lelogram P = {Aw, + pw. : 0 < A, < 1}. Double periodicity is the defining 
characteristic of elliptic functions. 


Figure 19.10: Unit cell and double-periodicity. 


Any non-constant meromorphic function f(z) that is doubly periodic has 
four basic properties: 


a) 


b) 


The function must have at least one pole in its unit cell. Otherwise 
it would be holomorphic and bounded, and therefore a constant by 
Liouville. 

The sum of the residues at the poles must add to zero. This follows 
from integrating f(z) around the boundary of the period parallelogram 
and observing that the contributions from opposite edges cancel. 

The number of poles in each unit cell must equal the number of zeros. 
This follows from integrating f’/f around the boundary of the period 
parallelogram. 

If f has zeros at the N points z; and poles at the N points p; then 


N N 
) ae ) py = NW, + MW 
i=1 i=1 


where m,n are integers. This follows from integrating zf’/f around 
the boundary of the period parallelogram. 
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The Weierstrass 0 has a second-order pole at the origin. It also obeys 


1 
li en) ay 
ae (o 3) 


e(z) = ~(-2), 
g'(z) = —g'(—z). (19.155) 
The property that makes ¢(z) useful for evaluating integrals is 
(o'(z))? = 40%(z) — g2@(2) -— 9s, (19.156) 


1 1 
92 = 60 awn Ee A GB 140 oo, 
(m x, qe nooo ere) 
(19.157) 


Equation (19.156) is proved by examining the first few terms in the Laurent 
expansion in z of the difference of the left hand and right hand sides. All 
negative powers cancel, as does the constant term. The difference is zero at 
z = 0, has no poles or other singularities, and, being continuous and peri- 
odic, is automatically bounded. It is therefore identically zero by Liouville’s 
theorem. 

From the symmetry and periodicity of 9 we see that o'(z) = 0 at w,/2, 
w»/2 and (w1+w.)/2 where g(z) takes the values e, = 9(w1/2), e2 = o(we/2), 
and e3 = 9((w1 +we2)/2). Now g’ must have exactly three zeros since it has a 
pole of order three at the origin and, by property c), the number of zeros in 
the unit cell is equal to the number of poles. We therefore know the location 
of all three zeros, and can factorize: 


40°(z) — g2(z) — 93 = 4(~ — €1)(” — €2)(@ — es). (19.158) 


We note that the coefficient of g? in the polynomial on the left side is zero, 
implying that e; + e2 +e3 = 0. 

The roots e; can never coincide. For example, (g(z) — e1) has a double 
zero at w,/2, but two zeros is all it is allowed because the number of poles 
per unit cell equals the number of zeros, and (g(z) — e ) has a double pole at 
0 as its only singularity. Thus, (yg —e1,) cannot be zero at another point, but 
it would be if e; coincided with eg or e3. As a consequence, the discriminant 


A © 16(e; — €2)?(e2 — e3)?(e1 — eg)? = g3 — 2792, (19.159) 
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is never zero. 
We use g and (19.156) to write 


= dt . dt 
2=0 w= f a ————— 
co 2y/(t — €1)(t — €2)(t — es) co V/4t? — got — gs 
(19.160) 
This maps the wu plane, with cuts that we can take from e; to eg and e3 to ov, 
one-to-one onto the 2-torus, regarded the unit cell of the wpm = nw, + mw 
lattice. 
As z sweeps over the torus, the points « = 9(z), y = o'(z) move on the 
elliptic curve 
y” = 42° — gor — g3 (19.161) 


which should be thought of as a set in CP?. These curves, and the finite fields 
of rational points that lie on them, are exploited in modern cryptography. 

The magic that leads to addition formula, such as the Euler-Fagnano re- 
lation (19.144) with which we began this section, lies in the (not immediatley 
obvious) fact that any elliptic function having the same periods as ¢(z) can 
be expressed as a rational function of o(z) and g’(z). From this it follows 
(after some thought) that any two such elliptic functions, fi(z) and f(z), 
obey a relation F'(f,, fo) = 0, where 


Rag= \- aay (19.162) 


is a polynomial in x and y. We can eliminate g’(z) in these relations by 


writing o'(z) = /493(z) — go@(z) — 9s. 


Modular invariance 


If w, and w». are periods and define a unit cell, so are 


Ww, = aw, t+ bu, 


Wy = Cwy+dwe 


where a, b,c,d are integers with ad — bc = +1. This condition on the deter- 
minant ensures that the matrix inverse also has integer entries, and so the w; 
can be expressed in terms of the w/ with integer coefficients. Consequently 
the set of integer linear combinations of the w/ generate the same lattice as 
the integer linear combinations of the original w;. This notion of redefining 


19.5. ELLIPTIC FUNCTIONS 829 


the unit cell should be familiar to you from solid state physics. If we wish 
to preserve the orientation of the basis vectors, we must restrict ourselves 
to maps whose determinant ad — bc is unity. The set of such transforms 
constitute the the modular group SL(2, Z). Clearly 9 is invariant under this 
group, as are go and g3 and A. Now define w2/w,; = 7, and write 


Th x 1. Lt ge 
g2(w1, We) = oa G27), 93(w1, W2) = op 937): A(w, w2) = Bn AW), 
1 1 1 


(19.163) 
and also 
i iD (19.164) 
J tT). = SS FL ; 
”) 93 — 2793 A 


Because the denominator is never zero when Im7 > 0, the function J(r7) is 
holomorphic in the upper half-plane — but not on the real axis. The function 
J(rT) is called the elliptic modular function. 

Except for the prefactors w”, the functions g,(r), A(r) and J(r) are 
invariant under the Mobius transformation 


aT +b 
ple 19.165 
eine a ( ) 
with 
a 6) esiz) (19.166) 
c d as 


This Mobius transformation does not change if the entries in the matrix are 
multiplied by a common factor of +1, and so the transformation is an element 
of the modular group PSL(2, Z) = SL(2, Z)/{J, —J}. 

Taking into account the change in the wf prefactors we have 


in (S42) = er tatntny, 
in (SE) = lr + atatny, 
A(=45) = (er+d)"A(r). (19.167) 


Because c = 0 and d = 1 for the special case rT — 7+ 1, these three functions 
obey f(7+1) = f(r) and so depend on 7 only via the combination q? = e?””. 


830CHAPTER 19. SPECIAL FUNCTIONS AND COMPLEX VARIABLES 
For example, it is not hard to prove that 
A(r) = (2m)?@ T] (4- a)". (19.168) 
n=1 


We can also expand these functions as power series in g? — and here things 
get interesting because the coefficients have number-theoretic properties. For 
example 


G2(T) (27)* 


1 — 2n 
rr 20 Ys esl | : 


(oe) 


Sr 
w 
“—— 
4 
emt 
| 
—— 
bo 
=) 
eet 
aD 
rs | 
| - 
| 
| 


cto ; (19.169) 


n=1 


The symbol o;,(n) is defined by o,(n) = 5>d", where d runs over all positive 
divisors of the number n. 
In the case of the function J(7), the prefactors cancel and 


J (==) = J(r), (19.170) 


so J(T) is a modular invariant. One can show that if J(7) = J(72), then 


aT; +b 
= 19.171 
2 ct +d ( ) 


for some modular transformation with integer a,b,c,d, where ad — bc = 1, 
and further, that any modular-invariant function is a rational function of 
J(r). It seems clear that J(7) is rather a special object. 

This J(r) is the function referred to on page 560 in connection with the 
Monster group. As with the g;, J(7) depends on 7 only through q?. The first 
few terms in the power series expansion of J(7) in terms of q? turn out to be 


1728.J(r) = q-? + 744+ 196884? + 21493760q*+864299970q° +--+. (19.172) 


Since AJ(7) + B has all the same modular invariance properties as J(T), the 
numbers 1728 (= 12°) and 744 are just conventional normalizations. Once we 
set the coefficient of q~? to unity, however, the remaining integer coefficients 
are completely determined by the modular properties. A number-theory 
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interpretation of these integers seemed lacking until John McKay and others 
observed that 


l1=1 
196884 = 1+ 196883 
21493760 = 1+ 196883 + 21296786 
864299970 = 2x1+2x 196883 + 21296786 + 842609326, 
(19.173) 


where “1” and the large integers on the right-hand side are the dimensions of 
the smallest irreducible representations of the Monster. This “Monstrous 
Moonshine” was originally mysterious and almost unbelievable, (“moon- 
shine” = “fantastic nonsense”) but it was explained by Richard Borcherds 
by the use of techniques borrowed from string theory.? Borcherds received 
the 1998 Fields Medal for this work. 


19.6 Further exercises and problems 


Exercise 19.3: Show that the binomial series expansion of (1 + x)~” can be 
written as 


(l+2)"= dL -2)" Toy |x| <1. 


m=0 


Exercise 19.4: A Mellin transform and its inverse. Combine the Beta-function 
identity (19.15) with a suitable change of variables to evaluate the Mellin 
transform 


[oe) 
| x? (142) "de, v>0, 
0 


of (1+ 2)~” as a product of Gamma functions. Now consider the Bromwich 
contour integral 
1 c+t00 
———— ~*T(v — s)I(s) ds. 
ImiTW) ‘i x “I(v—s)I(s)ds 


— 100 

3“T was in Kashmir. I had been traveling around northern India, and there was one 
really long tiresome bus journey, which lasted about 24 hours. Then the bus had to stop 
because there was a landslide and we couldn’t go any further. It was all pretty darn 
unpleasant. Anyway, I was just toying with some calculations on this bus journey and 
finally I found an idea which made everything work”- Richard Borcherds (Interview in 
The Guardian, August 1998). 
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Here Rec € (0,v). The Bromwich contour therefore runs parallel to the imag- 
inary axis with the poles of I'(s) to its left and the poles of [(v — s) to its 
right. Use the identity 


I(s)C(1 — s) = wcosec ms 


to show that when |x| < 1 the contour can be closed by a large semicircle lying 
to the left of the imaginary axis. By using the preceding exercise to sum the 
contributions from the enclosed poles at s = —n, evaluate the integral. 


Exercise 19.5: Mellin-Barnes integral. Use the technique developed in the 
preceding exercise to show that 


_ I(c) ettoo — 6T'(a— s)['(b— s)I(s) 
F(a, b,c —2) = 5 Far) i . T(c—s) ue 


for a suitable range of x. This integral representation of the hypergeometric 
function is due to the English mathematician Ernest Barnes (1908), later a 
controversial Bishop of Birmingham. 


=) 


Show that the matrix differential equation 


Exercise 19.6: Let 


where 


0 a 0 0 
A=( rae B=(9 erry 


VGV=FGRes (5) + =F (a,b;e;2) (7) . 


has a solution 


Exercise 19.7: Kniznik-Zamolodchikov equation. The monodromy properties 
of solutions of differential equations play an important role in conformal field 
theory. The Fuchsian equations studied in this exercise are obeyed by the 
correlation functions in the level-k Wess-Zumino-Witten model. 


Let Va =1,...n, bespin-j, representation spaces for the group SU(2). Let 
W(z1,...,2n) be a function taking values in VY @V@) @---@V™. (In other 
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words W is a function Wj, ...i,,(21,---,2n) where the index ig labels states in 
the spin-j, factor.) Suppose that W obeys the Kniznik-Zamolodchikov (K-Z) 
equations 


(k +2) Ww = S> ew a=1,...,n, 
b,b#a 
where 
J) JO) = OTM ITP + IOI, 
and jo) indicates the su(2) generator J; acting on the V factor in the tensor 


product. If we set z1 = z, for example and fix the position of z2,...Z,, then 
the differential equation in z has regular singular points at the n— 1 remaining 
Zp. 


a) By diagonalizing the operator J - J) show that there are solutions 
W(z) that behave for zq close to zp as 


W(z) ~ (za - zp) Ai Aisa Aig , 


where 
AED) — _ Jaa +1) 
J k4+2? nn oie 
and j is one of the spins |ja — jo] < 7 < J1 + Ja Occuring in the decompo- 
sition of jg ® jp. 
b) Define covariant derivatives 


A 


and show that [V., Vo] = 0. Conclude that the effect of parallel transport 
of the solutions of the K-Z equations provides a representation of the 
braid group of the world lines of the zg. 
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Appendix A 


Linear Algebra Review 


In physics we often have to work with infinite dimensional vector spaces. 
Navigating these vasty deeps is much easier if you have a sound grasp of the 
theory of finite dimensional spaces. Most physics students have studied this 
as undergraduates, but not always in a systematic way. In this appendix 
we gather together and review those parts of linear algebra that we will find 
useful in the main text. 


A.1 Vector space 


A.1.1 Axioms 


A vector space V over a field F is a set equipped with two operations: a 

binary operation called vector addition which assigns to each pair of elements 

x, y € V a third element denoted by x + y, and scalar multiplication which 

assigns to an element x € V and A € F a new element Ax € V. There is also 

a distinguished element 0 € V such that the following axioms are obeyed:! 

1) Vector addition is commutative: x+y =y+x. 

2) Vector addition is associative: (x +y)+z=x+(y+z). 

3) Additive identity: 0+ x =x. 
) 


4) Existence of an additive inverse: for any x € V, there is an element 
(—x) € V, such that x + (—x) = 0. 

5) Scalar distributive law i) \(x + y) = Ax + Ay. 

6) Scalar distributive law ii) (A + )x = Ax + px. 


Tn this list 1, \, u,€ F and x, y,O€ V. 
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7) Scalar multiplicatiion is associative: (Ajw)x = A(x). 

8) Multiplicative identity: Ix = x. 
The elements of V are called vectors. We will only consider vector spaces 
over the field of the real numbers, F = R, or the complex numbers, F = C. 

You have no doubt been working with vectors for years, and are saying to 
yourself “I know this stuff.” Perhaps so, but to see if you really understand 
these axioms try the following exercise. Its value lies not so much in the 
solution of its parts, which are easy, as in appreciating that these commonly 
used properties both can and need to be proved from the axioms. (Hint: 
work the problems in the order given; the later parts depend on the earlier.) 


Exercise A.1: Use the axioms to show that: 


i) Ifx+0=x, then 0=0. 

) We have 0x = 0 for any x € V. Here 0 is the additive identity in F. 

) Ifx+y =0, then y = —x. Thus the additive inverse is unique. 

) Given x, y in V, there is a unique z such that x+z = y, to whit z = x—y. 
v) AO =0 for any \ € F. 

) If Ax = 0, then either x = 0 or A=0. 

) (-1)x = -x. 


A.1.2 Bases and components 


Let V be a vector space over F. For the moment, this space has no additional 
structure beyond that of the previous section — no inner product and so no 
notion of what it means for two vectors to be orthogonal. There is still much 
that can be done, though. Here are the most basic concepts and properties 
that need to be understand: 


i) A set of vectors {e1,€2,...,€n} is linearly dependent if there exist A” € 
F, not all zero, such that 


Ne, Dees bree re, = 0. (A.1) 


ii) If it is not linearly dependent, a set of vectors {e1, €2,...,€n} is linearly 
independent. For a linearly independent set, a relation 


Me, es Fees rv" en =0 (A.2) 


can hold only if all the A“ are zero. 
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iii) A set of vectors {e;,e2,...,e,} is said to span V if for any x € V there 
are numbers x” such that x can be written (not necessarily uniquely) 
as 

x=a'le, + r7eo+---+ 2"ep. (A.3) 


A vector space is finite dimensional if a finite spanning set exists. 

iv) A set of vectors {e1,€2,...,€n} is a basis if it is a maximal linearly 
independent set (i.e. introducing any additional vector makes the set 
linearly dependent). An alternative definition declares a basis to be a 
minimal spanning set (i.e. deleting any of the e; destroys the spanning 
property). Exercise: Show that these two definitions are equivalent. 

v) If {e1,e2,...,en} is a basis then any x € V can be written 


x=ale,+27e.+...2"en, (A.4) 


where the x“, the components of the vector with respect to this basis, 
are unique in that two vectors coincide if and only if they have the 
same components. 

vi) Fundamental Theorem: If the sets {e1,e,...,e,} and {f,, fo,..., fin} 
are both bases for the space V then m = n. This invariant integer is 
the dimension, dim(V), of the space. For a proof (not difficult) see 
a mathematics text such as Birkhoff and McLane’s Survey of Modern 
Algebra, or Halmos’ Finite Dimensional Vector Spaces. 

Suppose that {e1,e2,...,en} and {ej,e,...,e/,} are both bases, and that 


e, = ate’. (A.5) 


Since {e1, €2,..., en} is a basis, the e! can also be uniquely expressed in terms 
of the e,,, and so the numbers a! constitute an invertible matrix. (Note that 
we are, as usual, using the Einstein summation convention that repeated 
indices are to be summed over.) The components x“ of x in the new basis 
are then found by comparing the coefficients of e/, in 


ae =x=ae, =a" (ale, ) = (rae, (A.6) 
to be 2’# = atx’, or equivalently, x” = (ac*)e x'*, Note how the e,, and the 


x" transform in “opposite” directions. The components x” are therefore said 
to transform contravariantly. 
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A.2 Linear maps 


Let V and W be vector spaces having dimensions n and m respectively. A 
linear map, or linear operator, A is a function A: V — W with the property 
that 

A(Ax + py) = A(x) + pAly). (A.7) 


A.2.1 Matrices 


The linear map A is an object that exists independently of any basis. Given 
bases {e,,} for V and {f,} for W, however, the map may be represented by 
an m-by-n matrix. We obtain this matrix 


Gy as Hos Ge. 
2 2 2 
avy avo bee avn 
a ne lis (A.8) 
ay as ae a”, 


having entries a”,,, by looking at the action of A on the basis elements: 
A(e,) = fLa",. (A.9) 


To make the right-hand-side of (A.9) look like a matrix product, where we 
sum over adjacent indices, the array a”,, has been written to the right of the 
basis vector.?2 The map y = A(x) is therefore 


y=y'f, = A(x) = A(v"e,) = e"A(e,) = x" (fLa",,) = (a" 2"), (A-10) 
whence, comparing coefficients of f,, we have 
yo =a" ah. (A.11) 


The action of the linear map on components is therefore given by the usual 
matrix multiplication from the left: y = Ax, or more explicitly 


1 1 1 i 
y Gi Sa fos x! 
2 2 2 2 2 
y avy avg a An x 
= 2 : . : ; : (A.12) 
y™ a”, a™ ... an ae 


?You have probably seen this “backward” action before in quantum mechanics. If we 
use Dirac notation |7) for an orthonormal basis, and insert a complete set of states, |m)(m|, 
then Aln) = |m)(m|Aln). The matrix (m|A|n) representing the operator A operating on 
a vector from the left thus automatically appears to the right of the basis vectors used to 
expand the result. 
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The identity map I: V — V is represented by the n-by-n matrix 


1 0 0 0 
0 1 0 0 

I,=|0 01 0], (A.13) 
O° Mo0N gee A 


which has the same entries in any basis. 


Exercise A.2: Let U, V, W be vector spaces, and A: V —~W,B:U—->V 
linear maps which are represented by the matrices A with entries a“, and B 
with entries b’,, respectively. Use the action of the maps on basis elements 
to show that the map AB : U — W is represented by the matrix product AB 
whose entries are a!,b*,. 


A.2.2  Range-nullspace theorem 


Given a linear map A: V — W, we can define two important subspaces: 
i) The kernel or nullspace is defined by 


Ker A = {x € V: A(x) = O}. (A.14) 


It is a subspace of V. 
ii) The range or image space is defined by 


ImA={y €@€W:y=A(x),xe€ V}. (A.15) 


It is a subspace of the target space W. 
The key result linking these spaces is the range-nullspace theorem which 
states that 


dim (Ker A) + dim (Im A) = dim V 


It is proved by taking a basis n,, for Ker A and extending it to a basis for the 
whole of V by appending (dim V — dim (Ker A)) extra vectors e,. It is easy 
to see that the vectors A(e,) are linearly independent and span Im A C W. 
Note that this result is meaningless unless V is finite dimensional. 

The number dim (Im A) is the number of linearly independent columns 
in the matrix, and is often called the (column) rank of the matrix. 
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A.2.3 The dual space 


Associated with the vector space V is its dual space, V*, which is the set of 
linear maps f : V — F. In other words the set of linear functions f( ) that 
take in a vector and return a number. These functions are often also called 
covectors. (Mathematicians place the prefix co- in front of the name of a 
mathematical object to indicate a dual class of objects, consisting of the set 
of structure-preserving maps of the original objects into the field over which 
they are defined.) 
Using linearity we have 


fH a eae fe) Hes a (A.16) 


The set of numbers f,, = f(e,,) are the components of the covector f € V*. 
If we change basis e, = a/e!, then 


fr = flev) = Fave.) = a fe) = ab h- (A.17) 


Thus f, =a! f/, and the f,, components transform in the same manner as the 
basis. They are therefore said to transform covariantly. 

Given a basis e, of V, we can define a dual basis for V* as the set of 
covectors e*” € V* such that 


e(e,) = dF. (A.18) 
It should be clear that this is a basis for V*, and that f can be expanded 
PH fies (A.19) 


Although the spaces V and V* have the same dimension, and are therefore 
isomorphic, there is no natural map between them. The assignment e,, +> e*” 
is unnatural because it depends on the choice of basis. 

One way of driving home the distinction between V and V* is to consider 
the space V of fruit orders at a grocers. Assume that the grocer stocks only 
apples, oranges and pears. The elements of V are then vectors such as 


x = 3kgapples + 4.5kg oranges + 2kg pears. (A.20) 
Take V* to be the space of possible price lists, an example element being 


f = (£3.00/kg) apples* + (£2.00/kg) oranges* + (£1.50/kg) pears*. 
(A.21) 
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The evaluation of f on x 
f(x) =3x £3.00 +4.5 x £2.0042 x £1.50 = £21.00, (A.22) 


then returns the total cost of the order. You should have no difficulty in 
distinguishing between a price list and box of fruit! 

We may consider the original vector space V to be the dual space of V* 
since, given vectors in x € V and f € V*, we naturally define x(f) to be 
f(x). Thus (V*)* = V. Instead of giving one space priority as being the set 
of linear functions on the other, we can treat V and V* on an equal footing. 
We then speak of the paring of x € V with f € V* to get a number in the 
field. It is then common to use the notation (f,x) to mean either of f(x) or 
x(f). Warning: despite the similarity of the notation, do not fall into the 
trap of thinking of the pairing (f,x) as an inner product (see next section) of 
f with x. The two objects being paired live in different spaces. In an inner 
product, the vectors being multiplied live in the same space. 


A.3  Inner-product spaces 


Some vector spaces V come equipped with an inner (or scalar) product. 
This additional structure allows us to relate V and V*. 


A.3.1 Inner products 


We will use the symbol (x,y) to denote an inner product. An inner (or 
scalar) product is a conjugate-symmetric, sesquilinear, non-degenerate map 
V x V —F-. In this string of jargon, the phrase conjugate symmetric means 
that 


(x,y) = (y, x)", (A.23) 


where the “x” denotes complex conjugation, and sesquilinear®? means 


A(x, y) + p(x, Z), (A.24) 
A*(x,z) + p*(y,Z). (A.25) 


(x, Ay + pz) 
(Ax + Ly, Z) 


The product is therefore linear in the second slot, but anti-linear in the 
first. When our field is the real numbers R then the complex conjugation is 


3 Sesqui is a Latin prefix meaning “one-and-a-half”. 
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redundant and the product will be symmetric 


(x,y) = (y, x), (A.26) 

and bilinear 
(x, Ay +z) = A(x, y)) + M(x, 2), (A.27) 
(Ax + wy,z) = A(x,z) + uly,2z). (A.28) 


The term non-degenerate means that if (x,y) = 0 for all y, then x = 0. 
Many inner products satisfy the stronger condition of being positive definite. 
This means that (x,x) > 0 unless x = 0, in which case (x,x) = 0. Positive 
definiteness implies non-degeneracy, but not vice-versa. 

Given a basis e,,, we can form the pairwise products 


(en, e,) = Juv: (A.29) 


If the array of numbers g,, constituting the components of the metric tensor 
turns out to be gz, = dv, then we say that the basis is orthonormal with 
respect to the inner product. We will not assume orthonormality without 
specifically saying so. The non-degeneracy of the inner product guarantees 
the existence of a matrix g“” which is the inverse of guy, 4.€. Jug’ = 5). 

If we take our field to be the real numbers R then the additional structure 
provided by a non-degenerate inner product allows us to identify V with V*. 
For any f € V* we can find a vector f € V such that 


Fix) = At. x (A.30) 


In components, we solve the equation 


tua = Iw f (A.31) 


for f”. We find f” = g’" f,,. Usually, we simply identify f with f, and hence 
V with V*. We say that the covariant components f,, are related to the 
contravariant components f" by raising 


frag" ty; (A.32) 
or lowering 

fi Ol (A.33) 
the indices using the metric tensor. Obviously, this identification depends 


crucially on the inner product; a different inner product would, in general, 
identify an f € V* with a completely different f € V. 
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A.3.2 Euclidean vectors 


Consider IR” equipped with its Euclidean metric and associated “dot” inner 
product. Given a vector x and a basis e,, with 9, = e,-e,, we can define two 
sets of components for the same vector. Firstly the coefficients x“ appearing 
in the basis expansion 

X= Ze, 


and secondly the “components” 
Vv 
Cp y= Owe 


of x along the basis vectors. The x, are obtained from the x” by the same 
“lowering” operation as before, and so x“ and x, are naturally referred to 
as the contravariant and covariant components, respectively, of the vector x. 
When the e, constitute an orthonormal basis, then g,, = 0,, and the two 
sets of components are numerically coincident. 


A.3.3. Bra and ket vectors 


When our vector space is over the field of complex numbers, the anti-linearity 
of the first slot of the inner product means we can no longer make a simple 
identification of V with V*. Instead there is an anti-linear corresponence 
between the two spaces. The vector x € V is mapped to (x, ) which, since 
it returns a number when a vector is inserted into its vacant slot, is an element 
of V*. This mapping is anti-linear because 


AX + py > (Ax + py, ) = A*(x, ) + uy, )- (A.34) 


This antilinear map is probably familiar to you from quantum mechanics, 
where V is the space of Dirac’s “ket” vectors |w) and V* the space of 
“bra” vectors (q|. The symbol, here w, in each of these objects is a label 
distinguishing one state-vector from another. We often use the eigenvalues 
of some complete set set of commuting operators. To each vector |W) we use 
the (...)' map to assign it a dual vector 


We Wy = 


having the same labels. The dagger map is defined to be antilinear 


(Al) + lx)? = °C] + Ol, (A.35) 
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and Dirac denoted the number resulting from the pairing of the covector (| 
with the vector |) by the “bra-c-ket” symbol (w|x): 


(lx) S (WI, [x))- (A.36) 


We can regard the dagger map as either determining the inner-product on V 


UIA 
def 


(lab) Ixd) = (oy bx) = (dL Ix) = lad), (A.37) 


or being determined by it as 


+ def 


wy) = We), ) = Wl. (A.38) 


When we represent our vectors by their components with respect to an 
orthonormal basis, the dagger map is the familiar operation of taking the 
conjugate transpose, 


Uy Ly 
x x 

eyo]! | = (eta... 28) (A.39) 
rn rn 


but this is not true in general. In a non-orthogonal basis the column vector 
with components xz” is mapped to the row vector with components (x'),, = 
(2")*Qup- 

Much of Dirac notation tacitly assumes an orthonormal basis. For exam- 
ple, in the expansion 


Ib) = D7 In) (nled) (A.40) 


the expansion coefficients (n|q) should be the contravariant components of 
|W), but the (n|w) have been obtained from the inner product, and so are in 
fact its covariant components. The expansion (A.40) is therefore valid only 
when the |n) constitute an orthonormal basis. This will always be the case 
when the labels on the states show them to be the eigenvectors of a complete 
commuting set of observables, but sometimes, for example, we may use the 
integer “n” to refer to an orbital centered on a particular atom in a crystal, 
and then (n|m) 4 dmn. When using such a non-orthonormal basis it is safer 
not to use Dirac notation. 


A.3. INNER-PRODUCT SPACES 845 


Conjugate operator 


A linear map A: V — W automatically induces a map A* : W* > V*. 
Given f € W* we can evaluate f(A(x)) for any x in V, and so f(A(_)) is an 
element of V* that we may denote by A*(f). Thus, 


A®(f)(x) = f(AGs)). (A.41) 


Functional analysts (people who spend their working day in Banach space) 
call A* the conjugate of A. The word “conjugate” and the symbol A®* is 
rather unfortunate as it has the potential for generating confusion? — not 
least because the (...)* map is linear. No complex conjugation is involved. 
Thus 


(1A + pB)* = \A* + pB". (A.42) 


Dirac deftly sidesteps this notational problem by writing (w|A for the 
action of the conjugate of the operator A : V — V on the bra vector (| € V*. 
After setting f — (| and x — |y), equation (A.41) therefore reads 


(IA) Ix) = (hI CATX)) - (A.43) 


This shows that it does not matter where we place the parentheses, so Dirac 
simply drops them and uses one symbol (w|A|y) to represent both sides 
of (A.43). Dirac notation thus avoids the non-complex-conjugating “*” by 
suppressing the distinction between an operator and its conjugate. If, there- 
fore, for some reason we need to make the distinction, we cannnot use Dirac 
notation. 


Exercise A.3: If A: V — V and B: V — V show that (AB)* = B*A’*. 


Exercise A.4: How does the reversal of the operator order in the previous 
exercise manifest itself in Dirac notation? 


Exercise A.5: Show that if the linear operator A is, in a basis e,,, represented 
by the matrix A, then the conjugate operator A* is represented in the dual 
basis e* by the transposed matrix A’. 


4The terms dual, transpose, or adjoint are sometimes used in place of “conjugate.” 
Each of these words brings its own capacity for confusion. 


846 APPENDIX A. LINEAR ALGEBRA REVIEW 


A.3.4 Adjoint operator 


The “conjugate” operator of the previous section does not require an inner 
product for its definition, and is a map from V* to V*. When we do have an 
inner product, however, we can use it to define a different operator “conju- 
gate” to A that, like A itself, is a map from V to V. This new conjugate is 
called the adjoint or the Hermitian conjugate of A. To construct it, we first 
remind ourselves that for any linear map f : V — C, there is a vector f € V 
such that f(x) = (f,x). (To find it we simply solve f, = (f")*gu for f".) 
We next observe that x + (y, Ax) is such a linear map, and so there is a z 
such that (y, Ax) = (z,x). It should be clear that z depends linearly on y, 
so we may define the adjoint linear map, A', by setting A'y = z. This gives 
us the identity 


(y, Ax) = (Aly, x) 
The correspondence A+> At is anti-linear 
(AA + pB)' = \* At + p* BT. (A.44) 
The adjoint of A depends on the inner product being used to define it. Dif- 
ferent inner products give different A'’s. 
In the particular case that our chosen basis e,, is orthonormal with respect 


to the inner product, 7.e. 
(ea; e,) = Onitiins (A.45) 


then the Hermitian conjugate A‘ of the operator A is represented by the 
Hermitian conjugate matrix A’ which is obtained from the matrix A by 
interchanging rows and columns and complex conjugating the entries. 


Exercise A.6: Show that (AB)! = BT AT, 
Exercise A.7: When the basis is not orthonormal, show that 


(ANP, = (gop AMg”?)” . (A.46) 


A.4 Sums and differences of vector spaces 


A.4.1 Direct sums 


Suppose that U and V are vector spaces. We define their direct sum U 6 V 
to be the vector space of ordered pairs (u,v) with 


A(u1, Vi) + p(Ue, V2) = (Auy + fg, Avi + ptV2). (A.47) 
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The set of vectors {(u,0)} C U@®V forms a copy of U, and {(0,v)} CU®V 
a copy of V. Thus U and V may be regarded as subspaces of U @V. 

If U and V are any pair of subspaces of W, we can form the space U + V 
consisting of all elements of W that can be written as u+ v with u € U and 
v €V. The decomposition x = u+ v of an element x € U + V into parts in 
U and V will be unique (in that u; + v; = u2+ ve implies that u, = uy and 
Vi = Vo) if and only if U MV = {0} where {0} is the subspace containing 
only the zero vector. In this case U + V can be identified with U @V. 

If U is a subspace of W then we can seek a complementary space V such 
that W = U @V, or, equivalently, W =U+V with UNV = {0}. Such 
complementary spaces are not unique. Consider R®, for example, with U 
being the vectors in the x,y plane. If e is any vector that does not lie in 
this plane then the one-dimensional space spanned by e is a complementary 
space for U. 


A.4.2 Quotient spaces 


We have seen that if U is a subspace of W there are many complementary 
subspaces V such that W = U ®V. We can however define a unique space 
that we might denote by W — U and refer to as the difference of the two 
spaces. It is more common, however, to see this space written as W/U and 
referred to as the quotient of W modulo U. This quotient space is the vector 
space of equivalence classes of vectors, where we do not distinguish between 
two vectors in W if their difference lies in U. In other words 


x=y (modU) © x-yelU. (A.48) 


The collection of elements in W that are equivalent to x (mod U) composes 
a coset, written x + U, a set whose elements are x + u where u is any vector 
in U. These cosets are the elements of W/U. 

When we have a linear map A : U — V, the quotient space V/Im A is 
often called the co-kernel of A. 

Given a positive-definite inner product, we can define a unique orthogonal 
complement of U C W. We define U+ to be the set 


Ut ={xeW: (x,y) =0, Vy € U}. (A.49) 


It is easy to see that this is a linear subspace and that U @ Ut = W. For 
finite dimensional spaces 
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dim W/U = dimU+ = dim W — dimU 


and (U+)+ = U. For infinite dimensional spaces we only have (Ut)+ D U. 
(Be careful, however. If the inner product is not positive definite, U and U+ 
may have non-zero vectors in common. ) 

Although they have the same dimensions, do not confuse W/U with U+, 
and in particular do not use the phrase orthogonal complement without spec- 
ifying an inner product. 

A practical example of a quotient space occurs in digital imaging. A 
colour camera reduces the infinite-dimensional space £ of coloured light inci- 
dent on each pixel to three numbers, R, G and B, these obtained by pairing 
the spectral intensity with the frequency response (an element of £*) of the 
red, green and blue detectors at that point. The space of distingushable 
colours is therefore only three dimensional. Many different incident spectra 
will give the same output RGB signal, and are therefore equivalent as far 
as the camera is concerned. In the colour industry these equivalent colours 
are called metamers. Equivalent colours differ by spectral intensities that lie 
in the space B of metameric black. There is no inner product here, so it is 
meaningless to think of the space of distinguishable colours as being Bt. It 
is, however, precisely what we mean by £/B. 


A.4.3 Projection-operator decompositions 


An operator P : V — V that obeys P? = P is called a projection operator. 
It projects a vector x € V to Px € ImP along Ker P — in the sense of 
casting a shadow onto Im P with the light coming from the direction Ker P. 
In other words all vectors lying in Ker P are killed, whilst any vector already 
in Im P is left alone by P. (If x € ImP then x = Py for some y € V, and 
Px = P*y = Py =x.) The only vector common to both Ker P and Im P is 
0, and so 


V =KerP@ImP. (A.50) 


A set of projection operators P; that are “orthogonal” 
and sum to the identity operator 


SPS (A.52) 


A.5. INHOMOGENEOUS LINEAR EQUATIONS 849 


is called a resolution of the identity. The resulting equation 
xe \ Px (A.53) 


decomposes x uniquely into a sum of terms P;x € Im P; and so decomposes 
V into a direct sum of subspaces V; = Im P;: 


V=Q\s. (A.54) 


Exercise A.8: Let P; be a projection operator. Show that Py = I — P, is 
also a projection operator and P; Py = 0. Show also that Im Py = Ker P; and 
Ker Py = Im P,. 


A.5 Inhomogeneous linear equations 


Suppose we wish to solve the system of linear equations 


Q11Y1 + A12Y2 +++: +Ainyn = bi 


a21y1 + G22Y2 +++: + AanYn = de 


AmiY1 + Am2Y2 Fe++ + AmnYn = Ga 


or, in matrix notation, 


Ay =b, (A.55) 


where A is the m-by-n matrix with entries a;;. Faced with such a problem, 
we should start by asking ourselves the questions: 

i) Does a solution exist? 

ii) If a solution does exist, is it unique? 
These issues are best addressed by considering the matrix A as a linear 
operator A: V — W, where V is n dimensional and W is m dimensional. 
The natural language is then that of the range and nullspaces of A. There 
is no solution to the equation Ay = b when Im A is not the whole of W 
and b does not lie in Im A. Similarly, the solution will not be unique if 
there are distinct vectors x,, X» such that Ax; = Ax». This means that 
A(x, — X2) = 0, or (x, — x2) € KerA. These situations are linked, as we 
have seen, by the range null-space theorem: 


dim (Ker A) + dim (Im A) = dim V. (A.56) 
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Thus, if m > n there are bound to be some vectors b for which no solution 
exists. When m < n the solution cannot be unique. 


A.5.1 Rank and index 


Suppose V = W (so m = n and the matrix is square) and we chose an inner 
product, (x,y), on V. Then x € Ker A implies that, for all y 


0 = (y, Ax) = (Aly,x), (A.57) 


or that x is perpendicular to the range of A’. Conversely, let x be perpen- 
dicular to the range of A‘; then 


(x, Aly) =0, Vy eV, (A.58) 


which means that 
(Ax,y)=0, Vy EV, (A.59) 


and, by the non-degeneracy of the inner product, this means that Ax = 0. 
The net result is that 


Ker A = (Im A?)*. (A.60) 
Similarly 
Ker At = (Im A)~. (A.61) 
Now 
dim (Ker A)+dim(Im A) = dimV, 
dim (Ker A') + dim(Im.A') = dimV, (A.62) 
but 


dim (Ker A) = dim (Im A‘)+ 
dim V — dim (Im A‘) 
= dim (Ker A’). 


Thus, for finite-dimensional square matrices, we have 
dim (Ker A) = dim (Ker A‘) 


In particular, the row and column rank of a square matrix coincide. 
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Example: Consider the matrix 
I e283 
Ae ke We a 
2 8: <a 


Clearly, the number of linearly independent rows is two, since the third row 
is the sum of the other two. The number of linearly independent columns is 
also two — although less obviously so — because 


1 2 3 
et te en (| Vara 
4 


Warning: The equality dim (Ker A) = dim (Ker A‘), need not hold in infi- 
nite dimensional spaces. Consider the space with basis e;, e2, e3,... indexed 
by the positive integers. Define Ae; = e2, Aeg = e3, and so on. This op- 
erator has dim (Ker A) = 0. The adjoint with respect to the natural inner 
product has Ate; = 0, Ateg = e;, Ale; = eg. Thus Ker A’ = fe}, and 
dim (Ker A‘) = 1. The difference dim (Ker A) — dim (Ker A‘) is called the in- 
dex of the operator. The index of an operator is often related to topological 
properties of the space on which it acts, and in this way appears in physics 
as the origin of anomalies in quantum field theory. 


A.5.2 Fredholm alternative 


The results of the previous section can be summarized as saying that the 
Fredholm Alternative holds for finite square matrices. The Fredholm Alter- 
native is the set of statements 
I. Either 
i) Ax = b has a unique solution, 
or 
ii) Ax = 0 has a solution. 
II. If Ax = 0 has n linearly independent solutions, then so does Atx = 0. 
Ill. If alternative ii) holds, then Ax = b has no solution unless 6 is orthog- 
onal to all solutions of A'x = 0. 
It should be obvious that this is a recasting of the statements that 


dim (Ker A) = dim (Ker A'), 
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and 
(Ker A')+ =Im A. (A.63) 


Notice that finite-dimensionality is essential here. Neither of these statement 
is guaranteed to be true in infinite dimensional spaces. 


A.6 Determinants 


A.6.1 Skew-symmetric n-linear forms 


You will be familiar with the elementary definition of the determinant of an 
n-by-n matrix A having entries a,;: 


ay a42 Racos Ain 
G21 422... @2n | def 

det A = . ; is . = Ctxig..in Miz A2ig +++ Anin- (A.64) 
Ani An2 aaa Ann 


Here, €;,i,...;, is the Levi-Civita symbol, which is skew-symmetric in all its 
indices and «2 , = l. From this definition we see that the determinant 
changes sign if any pair of its rows are interchanged, and that it is linear in 
each row. In other words 


Ady, + by, Adiga t+ phyg «.. Adin + [Din 
C21 C22 oes Can 
Cn1 Cn2 ies Cnn 
aii =a12 Qin bir bie bin 
C21 C22 Con C21 C22 C2n 
= i leo ee 
Chl Cn2 Cnn Cn1 Cn2 Cnn 


If we consider each row as being the components of a vector in an n-dimensional 
vector space V, we may regard the determinant as being a skew-symmetric 
n-linear form, 7.e. a map 


n factors 


i 
wiVxVx...VoF (A.65) 


A.6. DETERMINANTS 893 


which is linear in each slot, 
w(Aa + pb, C2,...,€n) = Aw(a,Ca,...,€n) + pw(b,c2,...,€n), (A.66) 
and changes sign when any two arguments are interchanged, 
CS] RI: ontaraets CS Py ete) CRE: FA: mere (A.67) 


We will denote the space of skew-symmetric n-linear forms on V by the 
symbol A\"(V*). Let w be an arbitrary skew-symmetric n-linear form in 
A"(V*), and let {e1,e2,...,e,} be a basis for V. If a; = ajje; (¢ = 1,...,n) 
is a collection of n vectors’, we compute 


w(ay, ag,.-- , an) = Aji, A2in+-- Anji, H(i, 5 Cine es) e;,) 


= Ali A2in+-- Cig Cita. i (C1, @9,..-. ,Cn). (A.68) 


In the first line we have exploited the linearity of w in each slot, and in going 
from the first to the second line we have used skew-symmetry to rearrange 
the basis vectors in their canonical order. We deduce that all skew-symmetric 
n-forms are proportional to the determinant 


ay a42  «...» Ain 
21 a22 weed Q2n 
W(ai,a2,...,An) xX] . . xe a 
Qn1 Qn2 --- Ann 
and that the proportionality factor is the number w(e1,eo,...,¢,). When 


the number of its slots is equal to the dimension of the vector space, there is 
therefore essentially only one skew-symmetric multilinear form and /\"(V*) 
is a one-dimensional vector space. 

Now we use the notion of skew-symmetric n-linear forms to give a pow- 
erful definition of the determinant of an endomorphism, i.e. a linear map 
A:V-—-V. Let w be a non-zero skew-symmetric n-linear form. The object 


W(X1,X2,---,Xn) < w(Ax,, AXo,..., AXn). (A.69) 


°The index j on aj; should really be a superscript since aj; is the j-th contravariant 
component of the vector a;. We are writing it as a subscript only for compatibility with 
other equations in this section. 
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is also a skew-symmetric n-linear form. Since there is only one such object 
up to multiplicative constants, we must have 


w( Ax), AXo,..., AXn) & w(x), X2,...,Xn)- (A.70) 
We define “det A” to be the constant of proportionality. Thus 
w(Ax,, AXe,..., AX,) = det (A)w(x1, x2, ..., Xn). (A.71) 


By writing this out in a basis where the linear map A is represented by the 
matrix A, we easily see that 


det A = det A. (A.72) 


The new definition is therefore compatible with the old one. The advantage 
of this more sophisticated definition is that it makes no appeal to a basis, and 
so shows that the determinant of an endomorphism is a basis-independent 
concept. A byproduct is an easy proof that det(AB) = det (A)det (B), a 
result that is not so easy to establish with the elementary definition. We 
write 


det (AB)w(x1,X2,...,Xn) = w(ABx,, ABx:,...,ABx,) 
w(A(Bx,), A(Bx2),..., A(Bxn)) 
det (A)w(Bx,, Bx2,..., Bx,) 

= det (A)det (B)w(x1,X2,...,Xn)- 


(A.73) 
Cancelling the common factor of w(x1,X2,...,Xn) completes the proof. 


Exercise A.9: Let w be a skew-symmetric n-linear form on an n-dimensional 
vector space. Assuming that w does not vanish identically, show that a set of 
n vectors X1,X2,.--,Xn is linearly independent, and hence forms a basis, if, 
and only if, w(x1,x2,...,Xn) #0. 


Exercise A.10: Extend the paring between V and its dual space V* to a 
pairing between the one-dimensional /\"(V*) and its dual space. Use this 
pairing, together with the result of exercise A.5, to show that 


det A? = det A* = [det A]* = [det A]? = det A = det A, 


6699 


where the “«” denotes the conjugate operator (and not complex conjugation) 
and the penultimate equality holds because transposition has no effect on a 
one-by-one matrix. Conclude that det A = det A7. A determinant is therefore 
unaffected by the interchange of its rows with its columns. 
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Exercise A.11: Cauchy-Binet formula. Let A be a m-by-n matrix and B be 
an n-by-m matrix. The matrix product AB is therefore defined, and is an 
m-by-m matrix. Let S be a subset of {1,...,} with m elements, and let Ag 
be the m-by-m matrix whose columns are the columns of A corresponding to 
indices in S. Similarly let Bg be the m-by-m matrix whose rows are the rows 
of B with indices in S. Show that 


det AB = S- det Ag det Bg 
Ss 


where the sum is over all n!/m!(n — m)! subsets S. If m > n there there are 
no such subsets. Show that in this case det AB = 0. 


heey 


be a partitioned matrix where a is m-by-m, b is m-by-n, c is n-by-m, and d 
is n-by-n. By making a Gaussian decomposition 


A- In x Ay 0 Ly; 0 
I O <A» ye SE? 


show that, for invertible d, we have Schur’s determinant formula® 


Exercise A.12: Let 


det A = det(d) det(a — bd~'c). 


A.6.2 The adjugate matrix 


Given a square matrix 


aii 12 Qin 
21 a22 ners Q2n 

Ase ee (A.74) 
Ani An2 are Ann 


and an element a,;, we define the corresponding minor M,, to be the deter- 
minant of the (n — 1)-by-(n — 1) matrix constructed by deleting from A the 
row and column containing a;;. The number 


Ay = (-1)* Mi; (A.75) 


°T. Schur, J. fiir reine und angewandte Math., 147 (1917) 205-232. 
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is then called the co-factor of the element a;;. (It is traditional to use up- 
percase letters to denote co-factors.) The basic result involving co-factors is 
that 

S| aij Av; = dydet A. (A.76) 


J 


When 7 = 7’, this is known as the Laplace development of the determinant 
about row 7. We get zero when i # i’ because we are effectively developing 
a determinant with two equal rows. We now define the adjugate matriz,’ 
Adj A, to be the transposed matrix of the co-factors: 


(Adj A);; = Aji. (A.77) 
In terms of this we have 
A(Adj A) = (det A)I. (A.78) 
In other words 
AS hii A. (A.79) 


Each entry in the adjugate matrix is a polynomial of degree n — 1 in the 
entries of the original matrix. Thus, no division is required to form it, and 
the adjugate matrix exists even if the inverse matrix does not. 


Exercise A.13: It is possible to Laplace-develop a determinant about a set of 
rows. For example, the development of a 4-by-4 determinant about its first 
two rows is given by: 


a bb Gg dy 


a2 bg cp do} _ |ai bi} }c3 d3}_ jai ci}}b3 ds a, dy} | b3 

az 63 cz d3| jaz b2|\ca dy az c2||b4 dy az d2||b4 
a4 ba C4 d4 

4 by C1 a3 ds = by dy ag C3 C1 dy a3 

bo C2 a4 dy i) dg ag C4 C2 dy a4 


Understand why this formula is correct, and, using that insight, describe the 
general rule. 


7Some authors rather confusingly call this the adjoint matrix. 


C3 
C4 
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Exercise A.14: Sylvester’s Lemma.® 


Show that 


Let A and B be two n-by-n matrices. 


det A det B = 23 det A’ det B’ 


where A’ and B’ are constructed by selecting a fixed set of k < n columns of B 
(which we can, without loss of generality, take to to be the first k columns) and 
interchanging them with k columns of A, preserving the order of the columns. 
The sum is over all n!/k!(n — k)! ways of choosing columns of A. (Hint: Show 
that, without loss of generality, we can take the columns of A to be a set of 
basis vectors, and that, in this case, the lemma becomes a re-statement of your 
“general rule” from the previous problem.) 


Cayley’s theorem 


You will know that the possible eigenvalues of the n-by-n matrix A are given 
by the roots of its characteristic equation 


0 = det (A — AT) = (-1)" (A" — tr (A)A""! +--+ (= 1)"det (A)) , (A.80) 


and have probably met with Cayley’s theorem that asserts that every matrix 
obeys its own characteristic equation. 


A” —tr(A)A"™ 1 +--+ + (-1)"det (A)I = 0. (A.81) 
The proof of Cayley’s theorem involves the adjugate matrix. We write 
det (A — AT) = (-1)" A" + aA" "+--+ + an) (A.82) 
and observe that 
det (A — AI)I = (A — AI)Adj (A — AI). (A.83) 


Now Adj (A — AI) is a matrix-valued polynomial in X of degree n — 1, and it 
can be written 


Adj (A — AT) = CoA”? + Cyd"? + + Cnn, (A.84) 
for some matrix coefficients C;. On multiplying out the equation 


(—1)" (A + ayA™* +--+ Han) T= (A= AD)(CoA" 1+ CA"? +--+ Cn_1) 
(A.85) 


8J. J. Sylvester, Phil. Mag. 1 (1851) 295-305. 
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and comparing like powers of A, we find the relations 


(—1)"1 


—Co, 
—C, 4 


+ ACo, 


=Cua 


=O. 


ft AC), 


1 te AC,,-2, 


AC,,_1. 


Multiply the first equation on the left by A”, the second by A"~1, and so 
on down the last equation which we multiply by A° =I. Now add. We find 
that the sum telescopes to give Cayley’s theorem, 


A™+a,A™14+---+a,I =0, 


as advertised. 


A.6.3 Differentiating determinants 


Suppose that the elements of A depend on some parameter x. From the 


elementary definition 


det A = Cit ig... in A1i1 A2i2 ek Anins 


we find 
d 


In other words, 


/ / / 
ayy A192 one Qin 
d a2, 422 Q2n 
—detA=]|. ; ; 
dx : 
Anl An2 Ann 


ai 
/ 
A 


Ani 


= ! / 
dg det A = €t1i2...in (a), Arig -+-Anin + 15,4945 ‘ 


The same result can also be written more compactly as 


d da;; 
gees \ a 
dx d dx 


(A.86) 
a2 Qin Qi, 42 
! ! 
a a a a 
22 2n 21 22 
leet] 
! ! 
An2 ann Ant ane 
(A.87) 


/ 
© Onin $171 + A1j, M25... Ay; ‘ 
mr 
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where A,; is cofactor of a,;. Using the connection between the adjugate 
matrix and the inverse, this is equivalent to 


d dA, 
ma d dA 
—=* — —— -1 
a In (det A) = tr { a A \. (A.89) 


A special case of this formula is the result 


In (det A) = (A)... 


ju 


a 
A. 
= (A.90) 


A.7 Diagonalization and canonical forms 


An essential part of the linear algebra tool-kit is the set of techniques for the 
reduction of a matrix to its simplest, canonical form. This is often a diagonal 
matrix. 


A.7.1  Diagonalizing linear maps 


A common task is the diagonalization of a matrix A representing a linear 
map A. Let us recall some standard material relating to this: 
i) If Ax = Ax for a non-zero vector x, then x is said to be an eigenvector 
of A with ezgenvalue X. 
ii) A linear operator A on a finite-dimensional vector space is said to be 
self-adjoint, or Hermitian, with respect to the inner product ( , ) if 
A= At, or equivalently if (x, Ay) = (Ax, y) for all x and y. 
iii) If Ais Hermitian with respect to a positive definite inner product ( ,_ ) 
then all the eigenvalues 4 are real. To see that this is so, we write 


AWK 30) SS AK) Ss a Al AXE) St AK) SA eh (AOL) 


Because the inner product is positive definite and x is not zero, the 
factor (x, x) cannot be zero. We conclude that A = *. 

iii) If Ais Hermitian and ; and A, are two distinct eigenvalues with eigen- 
vectors x; and x,, respectively, then (x;,x;) = 0. To prove this, we 
write 


Re 5 3) = (x;, Ax;) = (Ax;, Xj) = (Hits. = An RAO) (A.92) 
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But AF = A;, and so (A; — Aj) (Ki,x;) = 0. Since, by assumption, 
(A; — Aj) # 0 we must have (x;,x;) = 0. 

iv) An operator A is said to be diagonalizable if we can find a basis for V 
that consists of eigenvectors of A. In this basis, A is represented by the 
matrix A = diag (Aj, A2,.--,; An), where the ; are the eigenvalues. 

Not all linear operators can be diagonalized. The key element determining 
the diagonalizability of a matrix is the minimal polynomial equation obeyed 
by the matrix representing the operator. As mentioned in the previous sec- 
tion, the possible eigenvalues an N-by-N matrix A are given by the roots of 
the characteristic equation 


0 = det (A — AT) = (—1)" (AX — tr (A)AN71 +--+ (—1) det (A)) . 
This is because a non-trivial solution to the equation 
Ax = Ax (A.93) 


requires the matrix A—AI to have a non-trivial nullspace, and so det (A — AI) 
must vanish. Cayley’s Theorem, which we proved in the previous section, 
asserts that every matrix obeys its own characteristic equation: 


AN —tr(A)AN7! +---+(-1)%det (A)I = 0 


The matrix A may, however, satisfy an equation of lower degree. For exam- 
ple, the characteristic equation of the matrix 


a=(% e) (A.94) 


is (A — \,)?. Cayley therefore asserts that (A — \,I)? = 0. This is clearly 
true, but A also satisfies the equation of first degree (A — A,I) = 0. 

The equation of lowest degree satisfied by A is said to be the minimal 
polynomial equation. It is unique up to an overall numerical factor: if two 
distinct minimal equations of degree n were to exist, and if we normalize 
them so that the coefficients of A” coincide, then their difference, if non- 
zero, would be an equation of degree < (n — 1) obeyed by A — anda 
contradiction to the minimal equation having degree n. 

If 


P(A) = (A — 41)" (A — Ag)? «(A — Apl)*" = 0 (A.95) 
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is the minimal equation then each root A; is an eigenvalue of A. To prove 
this, we select one factor of (A — A;I) and write 


P(A) = (A— A DQ(A), (A.96) 


where Q(A) contains all the remaining factors in P(A). We now observe 
that there must be some vector y such that x = Q(A)y is not zero. If there 
were no such y then Q(A) = 0 would be an equation of lower degree obeyed 
by A in contradiction to the assumed minimality of P(A). Since 


0 = P(A)y = (A —A,I)x (A.97) 


we see that x is an eigenvector of A with eignvalue );. 

Because all possible eigenvalues appear as roots of the characteristic equa- 
tion, the minimal equation must have the same roots as the characteristic 
equation, but with equal or lower multiplicities a;. 

In the special case that A is self-adjoint, or Hermitian, with respect to a 
positive definite inner product ( , ) the minimal equation has no repeated 
roots. Suppose that this were not so, and that A has minimal equation 
(A — AI)?R(A) = 0 where R(A) is a polynomial in A. Then, for all vectors 


x we have 
0 = (Rx, (A — AT)? Rx) = ((A — AI) Rx, (A — XI) Rx). (A.98) 


Now the vanishing of the rightmost expression shows that (A—AI)R(A)x = 0 
for all x. In other words 


(A — \I)R(A) = 0. (A.99) 


The equation with the repeated factor was not minimal therefore, and we 
have a contradiction. 

If the equation of lowest degree satisfied by the matrix has no repeated 
roots, the matrix is diagonalizable; if there are repeated roots, it is not. The 
last statement should be obvious, because a diagonalized matrix satisfies an 
equation with no repeated roots, and this equation will hold in all bases, 
including the original one. The first statement, in combination with with 
the observation that the minimal equation for a Hermitian matrix has no 
repeated roots, shows that a Hermitian (with respect to a positive definite 
inner product) matrix can be diagonalized. 


862 APPENDIX A. LINEAR ALGEBRA REVIEW 


To establish the first statement, suppose that A obeys the equation 
0 = P(A) = (A — 4, 1)(A — A2I)--- (A — 1,]), (A.100) 
where the A; are all distinct. Then, setting x — A in the identity® 


. G2e-t 6) .. (Cees 
ri enero Use nes = ern 
(x — A1)(@ — Ag) +++ (@ = An-1) 


(A.101) 


where in each term one of the factors of the polynomial is omitted in both 
numerator and denominator, we may write 


T=P,+Po+---+Pn, (A.102) 

where 
(A — AQI)(A — A3I)--- (A — AI) 
Or = Aa) — As) (Ar An)” 
etc. Clearly P;P; = 0 if i ¢ 7, because the product contains the minimal 


equation as a factor. Multiplying (A.102) by P; therefore gives P? = P,, 
showing that the P; are projection operators. Further (A — \;I)(P;) = 0, so 


P, = (A.103) 


(A —d,1)(P.x) =0 (A.104) 


for any vector x, and we see that P;x, if not zero, is an eigenvector with 
eigenvalue \;. Thus P; projects into the 7-th eigenspace. Applying the reso- 
lution of the identity (A.102) to a vector x shows that it can be decomposed 


x = P,;x+Pox+-::--+P,x 
Ky + X_g+--++Xp, (A.105) 


where x;, if not zero, is an eigenvector with eigenvalue \;. Since any x can 
be written as a sum of eigenvectors, the eigenvectors span the space. 


°The identity may be verified by observing that the difference of the left and right hand 
sides is a polynomial of degree n — 1, which, by inspection, vanishes at the n points x = ),;. 
But a polynomial that has more zeros than its degree must be identically zero. 
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Jordan decomposition 


If the minimal polynomial has repeated roots, the matrix can still be re- 
duced to the Jordan canonical form, which is diagonal except for some 1’s 
immediately above the diagonal. 

For example, suppose the characteristic equation for a 6-by-6 matrix A 
is 


0 = det (A — M1) = (1 — A)8(9 — 2), (A.106) 


but the minimal equation is 
0= (Ar=AVO2 = A)’. (A.107) 


Then the Jordan form of A might be 


dee Oe 40x 1G 

Oe BP Or Oe LO 

eee Oe <0. Ag, S0r° 20s, 0 
DATS | yy <0 ae te 16 (A.108) 

O° 0 0% One 10 

Oy 0? 20! (20. Wr. D3 


One may easily see that (A.107) is the minimal equation for this matrix. The 
minimal equation alone does not uniquely specify the pattern of A;’s and 1’s 
in the Jordan form, though. 

It is rather tedious, but quite straightforward, to show that any linear 
map can be reduced to a Jordan form. The proof is sketched in the following 
exercises: 


Exercise A.15: Suppose that the linear operator T' is represented by an N x N 
matrix, where N > 1. T obeys the equation 


(T — dD)? =0, 


with p = N, but does not obey this equation for any p < N. Here X\ isa 
number and IJ is the identity operator. 


i) Show that if T has an eigenvector, the corresponding eigenvalue must be 
A. Deduce that T’ cannot be diagonalized. 

ii) Show that there exists a vector e, such that (T’— AJ)Ne, = 0, but no 
lesser power of (T'— AJ) kills e4. 
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iii) Define eg = (T — AD )ei, e3 = (T — AI)2e1, etc. up to ey. Show that the 


vectors e1,...,ey are linearly independent. 
iv) Use e,,...,ey as a basis for your vector space. Taking 
0 0 1 
el = ; » €2= ’ » EN= aa le 
1 0 0 


write out the matrix representing T in the e; basis. 


Exercise A.16: Let T: V — V bea linear map, and suppose that the minimal 
polynomial equation satisfied by T is 


ODN at ol) Sy So: 


Let V,, denote the space of generalized eigenvectors for the eigenvalue \;. This 
is the set of x such that (T — \;J)"x = 0. You will show that 


V=Q\%,. 


i) Consider the set of polynomials Qy, ;(t) = (t — \s)~'-4+ Q(t) where 
j = 1,...,r;. Show that this set of N = )°,r; polynomials forms a 
basis for the vector space Fy_j(t) of polynomials in t of degree no more 
than N — 1. (Since the number of Q),,; is N, and this is equal to the 
dimension of Fy_1(t), the claim will be established if you can show that 
the polynomials are linearly independent. This is easy to do: suppose 
that 

Me ay,,jQ),,j (0) = 0. 

iJ 
Set ¢t = A; and deduce that a), 1 = 0. Knowing this, differentiate with 
respect to t and again set t = A; and deduce that a), 2 = 0, and so on. ) 

ii) Since the Q), ; form a basis, and since 1 € Fy_1, argue that we can find 
By,,7 Such that 

LSS) CuO: 
rid 
Now define 


RSS: bg Ougd); 


j=l 
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and so 


T=S°P, (x) 
ri 


Use the minimal polynomial equation to deduce that P;P; = 0 if 1 # 7. 
Multiplication of x by P; then shows that P;P; = 0;;P;. Deduce from this 
that * is a resolution of the identity into a sum of mutually orthogonal 
projection operators P; that project onto the spaces V),. Conclude that 
any x can be expanded as x = >, x; with x; = P;x € VY,. 

iii) Show that the decomposition also implies that Vy), NV), = {O} if i # 
j. (Hint: a vector in Vy, is called by all projectors with the possible 
exception of P; and a vector in V), will be killed by all the projectors 
with the possible exception of P;. ) 

iv) Put these results together to deduce that V is a direct sum of the V),. 

v) Combine the result of part iv) with the ideas behind exercise A.15 to 
complete the proof of the Jordan decomposition theorem. 


A.7.2  Diagonalizing quadratic forms 


Do not confuse the notion of diagonalizing the matrix representing a linear 
map A: V — V with that of diagonalizing the matrix representing a 
quadratic form. A (real) quadratic form is a map Q : V — R, which is 
obtained from a symmetric bilinear form B:V x V — R by setting the two 
arguments, x and y, in B(x, y) equal: 


Ox Boe): (A.109) 


No information is lost by this specialization. We can recover the non-diagonal 
(x # y) values of B from the diagonal values, Q(x), by using the polarization 
trick 


1 
B(x,y) = s1@( + y) — Ox) — Q)]. (A.110) 
An example of a real quadratic form is the kinetic energy term 
T(x) = sinat = sxx (A.111) 


in a “small vibrations” Lagrangian. Here, M, with entries m,,;, is the mass 
matrix. 

Whilst one can diagonalize such forms by the tedious procedure of finding 
the eigenvalues and eigenvectors of the associated matrix, it is simpler to use 
Lagrange’s method, which is based on repeatedly completing squares. 
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Consider, for example, the quadratic form 


1 1 -2 x 
Q=2? —y? — 2? + Qry — 4ez + Gyz = (2, y, z) 1 -1 38 y |. 
—2 3-1 Zz 


(A.112) 


We complete the square involving 2: 
Q= (ge +y = 22)? = 2y? + 10yz— 527, (A.113) 


where the terms outside the squared group no longer involve x. We now 
complete the square in y: 


5 15 
Spy oe Gypsy Se A.114 
Q = (w+ y — 22)" — (v2y Ti aa, ( ) 
so that the remaining term no longer contains y. Thus, on setting 
& = Dy 22: 
5 
= fy =z, 
1) Yy V2 
15 
¢ = ky 
we have 
1. <Q & 
Q=20-7+C=(E7,0)[0 -1 O] | n]. (A.115) 
0 oO 1 ¢ 


If there are no x”, y”, or 2” terms to get us started, then we can proceed by 
using (x + y)? and (x — y)?. For example, consider 


Q = 2wryt 2yz+ 2zy, 


1 1 
= 5(@ Hy) = a(t — y)? + 2xz + yz 
1 
= sty) +2e+y)z- 5-9) 
1 
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where 


2 
7 = lew). 
GC “=> 22, 


A judicious combination of these two tactics will reduce the matrix represent- 
ing any real quadratic form to a matrix with +1’s and 0’s on the diagonal, 
and zeros elsewhere. As the egregiously asymmetric treatment of x, y, z in 
the last example indicates, this can be done in many ways, but Cayley’s Law 
of Inertia asserts that the signature — the number of +1’s, —1’s and 0’s 
— will always be the same. Naturally, if we allow complex numbers in the 
redefinitions of the variables, we can always reduce the form to one with only 
+1’s and 0’s. 

The essential difference between diagonalizing linear maps and diagonal- 
izing quadratic forms is that in the former case we seek matrices A such that 
A~'MA is diagonal, whereas in the latter case we seek matrices A such that 
A’MA is diagonal. Here, the superscript T’ denotes transposition. 


Exercise A.17: Show that the matrix 
a b 
a7) 
representing the quadratic form 


Oey) = ax? + 2bay + cy” 


may be reduced to 


(01): Co 1): * (0 a): 


depending on whether the discriminant, ac — b?, is respectively greater than 
zero, less than zero, or equal to zero. 


Warning: You might be tempted to refer to the discriminant ac— b? as being 
the determinant of Q. It is indeed the determinant of the matrix Q, but there 
is no such thing as the “determinant” of the quadratic form itself. You may 
compute the determinant of the matrix representing Q in some basis, but if 
you change basis and repeat the calculation you will get a different answer. 
For real quadratic forms, however, the sign of the determinant stays the same, 
and this is all that the discriminant cares about. 
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A.7.3 Block-diagonalizing symplectic forms 


A skew-symmetric bilinear form w:V x V = R is often called a symplectic 
form. Such forms play an important role in Hamiltonian dynamics and in 
optics. Let {e;} be a basis for V, and set 


w(e;, e;) = Wij. (A.116) 
If x = xe; and y = y’e;, we therefore have 
w(x, y) = w(ei, es)2'y? = wiya'y’. (A.117) 


The numbers w,;; can be thought of as the entries in a real skew-symmetric 
matrix Q, in terms of which w(x, y) = x7 Qy. We cannot exactly “diagonal- 
ize” such a skew-symmetric matrix because a matrix with non-zero entries 
only on its principal diagonal is necessarily symmetric. We can do the next 
best thing, however, and reduce Q) to block diagonal form with simple 2-by-2 
skew matrices along the diagonal. 

We begin by expanding w as 


1 : i 
w= quiere” (A.118) 


where the wedge (or exterior) product e*) A e* of a pair of basis vectors in 
V* denotes the particular skew-symmetric bilinear form 


e” Ae (e,,€3) = 5.0% — 6,62. (A.119) 


Again, if x = xe; and y = y’e;, we have 


e* Ae" (x,y) = e” \ e (rey, ez) 
(54.53 — 51,6) ary? 
Soy Sys: (A.120) 
Consequently 
W(X, y) = swi(a'y’ — y'a") = wiga'y’, (A.121) 


as before. We extend the definition of the wedge product to other elements 
of V* by requiring “A” to be associative and distributive, taking note that 


ee“ Ae! = —e Ae™, (A.122) 


A.7. DIAGONALIZATION AND CANONICAL FORMS 869 
and so 0 = e*! A e*! = e* A e*?, ete. 
We next show that there exists a basis {f*’} of V* such that 
Daf KEP EEE AT es EPO) AP? (A.123) 


Here, the integer p < n is the rank of w. It is necessarily an even number. 
The new basis is constructed by a skew-analogue of Lagrange’s method 
of completing the square. If 


1 . . 
w= que” Aes (A.124) 


is not identically zero, we can, after re-ordering the basis if neceessary, assume 
that W412 7 0. Then 


1 
oe G — —(w3e"8 +--+ me™)) A(w2e*? +w3e"8 ++ + wpe’) +l 


W12 
(A.125) 
where w} € A?(V*) does not contain e*! or e*?. We set 
1 
p = ea a — (wo3e*? free Ht Wone"”) (A.126) 
W12 
and 
ei = wise + wee” foes Wine”. (A.127) 
Thus, 
w =f af? 4 wih, (A.128) 


If the remainder wt*} is identically zero, we are done. Otherwise, we apply 
the same same process to w‘*} so as to construct f**, f** and wt}; we continue 
in this manner until we find a remainder, w'?+!}, that vanishes. 

Let {f;} be the basis for V dual to the basis {f*’}. Then w(fi,f:) = 
—w (fo, f,) = w(fs, £,) = —w(fy, fg) = 1, and so on, all other values being zero. 
This shows that if we define the coefficients a’; by expressing f*' = a',;e”, 
and hence e; = f;a/;, then the matrix Q has been expressed as 


Q= ATOA, (A.129) 
where A is the matrix with entries a’;, and Q is the matrix 
0 1 
—-1 0 
Q = 0 1 : (A.130) 
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which contains p/2 diagonal blocks of 
0 1 
& i (A.131) 


Example: Consider the skew bilinear form 


and all other entries are zero. 


Or <a We 1G) y* 
—l 1 5 : 
Sy E Sf BAL dn 7 A y 
Oey) = sey See) 3 100 ole (A.132) 
0 -5 0 0 y! 
This corresponds to 
w =e"! Ae” + 3e*! Ae +e” Ae™ + 5e” Ae™. (A.133) 
Following our algorithm, we write w as 
w = (e*! — e* — 5e**) A (e*? + 3e*?) — 15e Ae*, (A.134) 
If we now set 
fl = e*! ie ee? _ 5e*4, 
f*? = e*? 4 Se" 
PS = =15e™, 
fr cee er, (A.135) 
we have 
at Ae A (A.136) 
We have correspondingly expressed the matrix Q as 
0 1 3 0 LO, “Oe: 0 1 0 -1 
al OY eae. 0 1 0 0 -1 0 Oi 3 
-3 -1 0 0} |-1 3 -15 0 0 1 0 0 -15 
0 -5 0 0 —5 0 OL —-1 0 0 0 0 
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Exercise A.18: Let Q“ be a skew symmetric 2n-by-2n matrix with entries 
Wi; = —w,;. Define the Pfaffian of Q by 
1 
bo a Do Kinda ian WivinWigig © -Wian tian 


Show that Pf (M’QM) = det (M) Pf (Q). By reducing Q to a suitable canon- 
ical form, show that (Pf Q)? = det Q. 


Exercise A.19: Let w(x, y) be a non-degenerate skew symmetric bilinear form 
on R?2”, and xj,...X2n a set of vectors. Prove Weyl’s identity 


1 
Pf (2) det |x1,.--X2nl = 525 So bt jofan (Kay Xig) + W(Kinn 1 Xian) 


Here det |x1,...X2n| is the determinant of the matrix whose rows are the x; 
and Q is the matrix corresponding to the form w. 


Now let M : R?” — R?” be a linear map. Show that 
Pf (2) (det M) det |x1,... Xan 
1 
a 2m! a €in,..ion¥ (Mi, , Mxi,) +++ w(MXi,_1, Xin, ); 
Deduce that if w(Mx, My) = w(x,y) for all vectors x,y, then det M = 1. 


The set of such matrices M that preserve w compose the symplectic group 
Sp(2n, R). 
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Appendix B 


Fourier Series and Integrals. 


Fourier series and Fourier integral representations are the most important 
examples of the expansion of a function in terms of a complete orthonormal 
set. The material in this appendix reviews features peculiar to these special 
cases, and is intended to complement the the general discussion of orthogonal 
series in chapter 2. 


B.1 Fourier series 


A function defined on a finite interval may be expanded as a Fourier series. 


B.1.1 Finite Fourier series 


Suppose we have measured f(z) in the interval [0, £], but only at the discrete 
set of points x = na, where a is the sampling interval and n = 0,1,..., N—1, 
with Na = L. We can then represent our data f(na) by a finite Fourier 
series. This representation is based on the geometric sum 


ikm(n’—n)a __ € 
» e — e2ri(n’—n)ja/N _ 1’ (B.1) 


N-1 Qri(n—n')a _ 1 
m=0 


where k,, = 2am/Na. For integer n, and n’, the expression on the right 
hand side of (B.1) is zero unless n’ — n’ is an integer multiple of N, when 
it becomes indeterminate. In this case, however, each term on the left hand 
side is equal to unity, and so their sum is equal to N. If we restrict n and n’ 
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to lie between 0 and N — 1, we have 


N-1 
S cikm(n'—n)a = Noise: (B.2) 
m=0 
Inserting (B.2) into the formula 
N-1 
f(na) = 2 f (n'a) bun, (B.3) 
n’'=0 
shows that 
N-1 , Xa 
—ikmna — tkmna 
na) = ane """", where a, =— najern™, B.4 
Fina) = 3 ye (B.) 


This is the finite Fourier representation. 

When f (na) is real, it is convenient to make the k,, sum symmetric about 
kim = 0 by taking N = 2M +1 and setting the summation limits to be +M. 
The finite geometric sum then becomes 


M 
S- im) ae + 1)6/2 (B.5) 
a sin 6/2 


We set 6 = 21(n' — n)/N and use the same tactics as before to deduce that 


M 
f(na) = S- Ge (B.6) 
m=—M 


where again k, = 21m/L, with L = Na, and 


it 2M 
eras tkmna BR 
om = DL faa) (B7) 


In this form it is manifest that f being real both implies and is implied by 
C= Oy 

These finite Fourier expansions are algebraic identities. No limits have 
to be taken, and so no restrictions need be placed on f(na) for them to be 
valid. They are all that is needed for processing experimental data. 

Although the initial f(a) was defined only for the finite range 0 <n < 
N — 1, the Fourier sum (B.4) or (B.7) is defined for any n, and so extends f 
to a periodic function of n with period N. 
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B.1.2. Continuum limit 


Now we wish to derive a Fourier representation for functions defined every- 
where on the interval [0, LZ], rather just at the sampling points. The natural 
way to proceed is to build on the results from the previous section by re- 
placing the interval [0, LZ] with a discrete lattice of N = 2M + 1 points at 
x = na, where a is a small lattice spacing which we ultimately take to zero. 
For any non-zero a the continuum function f(x) is thus replaced by the finite 
set of numbers f(na). If we stand back and blur our vision so that we can no 
longer perceive the individual lattice points, a plot of this discrete function 
will look little different from the original continuum f(z). In other words, 
provided that f is slowly varying on the scale of the lattice spacing, f(an) 
can be regarded as a smooth function of x = an. 
The basic “integration rule” for such smooth functions is that 


oD san) f Hlanyaan— f i) dz , (B.8) 


as a becomes small. A sum involving a Kronecker 6 will become an integral 
containing a Dirac 6-function: 


a> fla) =Sm = flma) + f fa) s{e-y)de= Fy). (B9) 


We can therefore think of the 6 function as arising from 


Ona 
a 


> d(a— 2’). (B.10) 


In particular, the divergent quantity 6(0) (in x space) is obtained by setting 
n =n’, and can therefore be understood to be the reciprocal of the lattice 
spacing, or, equivalently, the number of lattice points per unit volume. 

Now we take the formal continuum limit of (B.7) by letting a — 0 and 
N — oo while keeping their product Na = L fixed. The finite Fourier 
representation 


M 
f(na) = Ome Ne (B.11) 
m=—M 
now becomes an infinite series 
fj) = Ss ae re (B.12) 


m=—Co 
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whereas 
4 N-1 1 ft 
Qrim . 
one Nr X f(najeN= ™* — al fae rrr da (B.13) 


The series (B.12) is the Fourier expansion for a function on a finite interval. 
The sum is equal to f(z) in the interval [0, L]. Outside, it produces L-periodic 
translates of the original f. 

This Fourier expansion (B.12,B.13) is same series that we would obtain 
by using the L?(0, Z] orthonormality 


L 
al e2mime/L eo 2ming/L dr = Onis (B.14) 
L 0 


and using the methods of chapter two. The arguments adduced there, how- 
ever, guarantee convergence only in the L? sense. While our present “contin- 
uum limit” derivation is only heuristic, it does suggest that for reasonably- 
behaved periodic functions f the Fourier series (B.12) converges pointwise to 
f(x). A continuous periodic function possessing a continuous first derivative 
is sufficiently “well-behaved” for pointwise convergence. Furthermore, if the 
function f is smooth then the convergence is uniform. This is useful to know, 
but we often desire a Fourier representation for a function with discontinu- 
ities. A stronger result is that if f is piecewise continuous in [0, L] — z.e., 
continuous with the exception of at most finite number of discontinuities — 
and its first derivative is also piecewise continuous, then the Fourier series 
will converge pointwise (but not uniformly’) to f(x) at points where f(z) is 
continuous, and to its average 


F(z) = slim{f(e +6) + fle-9} (B.15) 


at those points where f(x) has jumps. In the section B.3.2 we shall explain 
why the series converges to this average, and examine the nature of this 
convergence. 

Most functions of interest to engineers are piecewise continuous, and this 
result is then all that they require. In physics, however, we often have to 
work with a broader class of functions, and so other forms of of convergence 


Tf a sequence of continuous functions converges uniformly, then its limit function is 
continuous. 
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become relevant. In quantum mechanics, in particular, the probability inter- 
pretation of the wavefunction requires only convergence in the L? sense, and 
this demands no smoothness properties at all—the Fourier representation 
converging to f whenever the L? norm || f ||? is finite. 


Half-range Fourier series 
The exponential series 


oS se ane ee (B.16) 


m=—Co 


can be re-expressed as the trigonometric sum 


1 (oe) 
f= 540 + S° {Am cos(27ma/L) + B,, sin(2amaz/L)}, (B.17) 
m=1 
where 
34 2a9 m = 0, 
Am = tie, m > 0, 
By tS Way). (B.18) 


This is called a full-range trigonometric Fourier series for functions defined on 
(0, L|. In chapter 2 we expanded functions in series containing only sines. We 
can expand any function f(x) defined on a finite interval as such a half-range 
Fourier series. To do this, we regard the given domain of f(x) as being the 
half interval [0, L/2] (hence the name). We then extend f(z) to a function 
on the whole of |0, Z] and expand as usual. If we extend f(x) by setting 
f(a+L/2) = —f(x) then the A,, are all zero and we have 


ia) = . B,sin(2ama/L), «x € [0, £/2], (B.19) 
where, : 
4 L/2 
Be = Z (x) sin(2ama/L) dx. (B.20) 
0 


Alternatively, we may extend the range of definition by setting f(#+L/2) = 
f(L/2 — <x). In this case it is the B,, that become zero and we have 


(CA Ao + s A, cos(2ama/L), «x € [0, L/2], (B.21) 


m=1 
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with 
4 pele 
Aa = 7 (x) cos(2ama/L) dx. (B.22) 
0 
The difference between a full-range and a half-range series is therefore 
seen principally in the continuation of the function outside its initial interval 
of definition. A full range series repeats the function periodically. A half- 
range sine series changes the sign of the continued function each time we 
pass to an adjacent interval, whilst the half-range cosine series reflects the 
function as if each interval endpoint were a mirror. 


B.2 Fourier integral transforms 


When the function we wish to represent is defined on the entirety of R then 
we must use the Fourier integral representation. 


B.2.1 Inversion formula 


We formally obtain the Fourier integral representation from the Fourier series 
for a function defined on [—L/2, L/2]. Start from 


(ie a » Giggles te (B.23) 
1 ee 2rim 

Im = — (tee de, (B.24) 
L J_tye 


and let L become large. The discrete k,, = 27m/L then merge into the 
continuous variable k and 


La famat fe (B.25) 


The product La,, remains finite, and becomes a function f(k). Thus 


fay) =f Fibe™ >, (B.26) 


Sb 

— 
> 

" 
| 


[ fia\e dx: (B.27) 
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This is the Fourier integral transform and its inverse. 

It is good practice when doing Fourier transforms in physics to treat x 
and k asymmetrically: always put the 27’s with the dk’s. This is because, 
as (B.25) shows, dk/27 has the physical meaning of the number of Fourier 
modes per unit (spatial) volume with wavenumber between k& and k + dk. 

The Fourier representation of the Dirac delta-function is 


(x—a2')= i dR ikea!) (B.28) 
eee DT 
Suppose we put « = 2’. Then “d(0)”, which we earlier saw can be interpreted 
as the inverse lattice spacing, and hence the density of lattice points, is equal 
to te. ge This is the total number of Fourier modes per unit length. 
Exchanging x and & in the integral representation of 6(a — x’) gives us 
the Fourier representation for 6(k — k’): 


7 elk-k)e da = In b(k — k’). (B.29) 
Thus 27(0) (in k space), although mathematically divergent, has the phys- 
ical meaning [dx, the volume of the system. It is good practice to put a 27 
with each 6(k) because this combination has a direct physical interpretation. 
Take care to note that the symbol 4(0) has a very different physical in- 
terpretation depending on whether 6 is a delta-function in x or in k space. 


Parseval’s identity 


Note that with the Fourier transform pair defined as 


f(k) = / * pike f(x) dx (B.30) 
fo) = fe ™ Fy S, (B31) 


Pareseval’s theorem takes the form 
[ lte@pac= fo (Foor. (B.32) 


Parseval’s theorem tells us that the Fourier transform is a unitary map 
from L?(R) — L?(R). 
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B.2.2 The Riemann-Lebesgue lemma 


There is a reciprocal relationship between the rates at which a function and 
its Fourier transform decay at infinity. The more rapidly the function decays, 
the more high frequency modes it must contain—and hence the slower the 
decay of its Fourier transform. Conversely, the smoother a function the fewer 
high frequency modes it contains and the faster the decay of its transform. 
Quantitative estimates of this version of Heisenberg’s uncertainty principle 
are based on the Riemann-Lebesgue lemma. 

Recall that a function f is in L'(R) if it is integrable (this condition 
excludes the delta function) and goes to zero at infinity sufficiently rapidly 
that 


fll = / fd <0. (B.33) 


If f € L'(R) then its Fourier transform 
f(k) = / f(x)et*® da (B.34) 


exists, is a continuous function of k, and 


IF(R)| < [fla (B.35) 


The Riemann-Lebesgue lemma asserts that if f € L1(R) then 


jim f(k) =0. (B.36) 
We will not give the proof. For f integrable in the Riemann sense, it is not 
difficult, being almost a corollary of the definition of the Riemann integral. 
We must point out, however, that the “|...|” modulus sign is essential in 
the L1(R) condition. For example, the integral 


i. ie sin(x”) dx (B.37) 


(oe) 


is convergent, but only because of extensive cancellations. The L'(R) norm 


i econ. (B.38) 


(oe) 
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is not finite, and whereas the Fourier transform of sin(2?), i.e. 


oo 2 
/ sin(x) e*” dx = \/z cos (“ **) . (B.39) 


(oe) 


is also convergent, it does not decay to zero as k grows large. 

The Riemann-Lebesgue lemma tells us that the Fourier transform maps 
L*(R) into C,,(R), the latter being the space of continuous functions vanish- 
ing at infinity. Be careful: this map is only into and not onto. The inverse 
Fourier transform of a function vanishing at infinity does not necessariliy lie 
in L(R). 

We link the smoothness of f(x) to the rapid decay of f(k), by combining 
Riemann-Lebesgue with integration by parts. Suppose that both f and f’ 
are in L!(R). Then 


y= fp f'(v) e** dx = —ik / * f(a) e** dx =—ikf(k) — (B.40) 


tends to zero. (No boundary terms arise from the integration by parts be- 
cause for both f and f’ to be in L1(R) the function f must tend to zero at 
infinity.) Since kf(k) tends to zero, f(k) itself must go to zero faster than 
1/k. We can continue in this manner and see that each additional derivative 
of f that lies in L'(R) buys us an extra power of 1/k in the decay rate of 


f at infinity. If any derivative possesses a jump discontinuity, however, its 
derivative will contain a delta-function, and a delta-function is not in L1(R). 
Thus, if n is the largest integer for which k” f(k) — 0 we may expect f(z) 
to be somewhere discontinuous. For example, the function f(a) = e~'*! has 
a first derivative that lies in L1(R), but this derivative is discontinuous. The 
Fourier transform f(k) = 2/(1 +k?) therefore decays as 1/k?, but no faster. 


B.3 Convolution 


Suppose that f(x) and g(x) are functions on the real line R. We define their 
convolution f * g, when it exists, by 


[f * g](x ye fe f(x — €) g(€) dé. (B.41) 


A change of variable € — x—€ shows that, despite the apparently asymmetric 
treatment of f and g in the definition, the * product obeys f*g=gq* f. 
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B.3.1 The convolution theorem 


Now, let f(k) denote the Fourier transforms of f, i.e. 


f(k) = / © pike f(x) de. (B.42) 
We claim that Bath 
[fxgl =f. (B.43) 


The following computation shows that this claim is correct: 


Ef glk) -[-é ees f(e-© u(6)as) de 
i ia eit F(x — €) g(€) d€dv 


i. 8) GE Fy — €) g(€) dé de 


ie elt eiKE F(x") g( €) db de! 


ie 
a 
( I. ei" F(a a’) ( [0 ie) 


g(k). (B.44) 


Note that we have freely interchanged the order of integrations. This is not 
always permissible, but it is allowed if f,g € L'(IR), in which case f * g is 
also in L*(R). 


B.3.2 Apodization and Gibbs’ phenomenon 


The convolution theorem is useful for understanding what happens when we 
truncate a Fourier series at a finite number of terms, or cut off a Fourier 
integral at a finite frequency or wavenumber. 

Consider, for example, the cut-off Fourier integral representation 


fr(c)=— ] flk)e~** dk, (B.45) 
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where f(k) = Jo. f(x) e** dx is the Fourier transform of f. We can write 
this as i. 7 

ice =| 0, (k) f(k) e-**? dk (B.46) 
where @,(k) is unity if |&k| < A and zero otherwise. Written this way, the 
Fourier transform of f, becomes the product of the Fourier transform of the 
original f with 0,4. The function f, itself is therefore the convolution 


fale) =f Ba OF) ae (B17) 
of f with | 
5? (x) = ssn) = =| Oy (k)e" ** dk, (B.48) 


which is the inverse Fourier transform of 6,(x). We see that f,(x) is a kind of 
local average of the values of f(a) smeared by the approximate delta-function 
6?(x). The superscript D stands for “Dirichlet,” and 62(x) is known as the 
Dirichlet kernel. 


—4 


lee 
Figure B.1: A plot of 762(x) for A =3. 


When f(a) can be treated as a constant on the scale (& 27/A) of the oscilla- 
tion in 6x (2), all that matters is that f° d2(x) dz = 1, and so f(x) © f(z). 
This is case if f(x) is smooth and A is sufficiently large. However, if f(a) pos- 
sesses a discontinuity at 29, say, then we can never treat it as a constant and 
the rapid oscillations in 6)(x) cause a “ringing” in f,(x) whose amplitude 
does not decrease (although the width of the region surrounding x9 in which 
the effect is noticeable will decrease) as A grows. This ringing is known as 
Gibbs’ phenomenon. 
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Figure B.2: The Gibbs phenomenon: A Fourier reconstruction of a piecewise 
constant function that jumps discontinuously from y = —0.25 to +0.25 at 
xv =0.29, 


The amplitude of the ringing is largest immediately on either side of the the 
point of discontinuity, where it is about 9% of the jump in f. This magnitude 
is determined by the area under the central spike in 6)(x), which is 


T/M os A 
- | SN) ie te, (B.49) 
as n/N x 


independent of A. For x exactly at the point of discontinuity, f(x) receives 
equal contributions from both sides of the jump and hence converges to the 
average 


jim fa(z) = 5{ Fle.) + #@}, (B.50) 


where f(x) are the limits of f taken from the the right and left, respectively. 
When x = xt) — 77/A, however, the central spike lies entirely to the left of the 
point of discontinuity and 


fa(z) 


2 


s{(l stig Fa YR 18 FeO) 
Fe 0.091 Fe) = fa) (B.51) 


2 


Consequently, f(a) overshoots its target f(x_) by approximately 9% of the 
discontinuity. Similarly when 7 = x7) + 7/A 


fr(u) © Flas) + 0.09{f(@4) — F(a}. (B.52) 
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The ringing is a consequence of the abrupt truncation of the Fourier sum. 
If, instead of a sharp cutoff, we gradually de-emphasize the higher frequencies 
by the replacement 


F(k) = f(k)en? (B.53) 
then 
fala) = 5 f FibyeroPe ak 
= f &e-oraac (B.54) 
where i 
Oo) = —— err (B.55) 
27a 


is a non-oscillating Gaussian approximation to a delta function. The effect 
of this convolution is to smooth out, or mollify, the original f, resulting in 
a C™ function. As a becomes small, the limit of f,(2) will again be the 
local average of f(x), so at a discontinuity f, will converge to the mean 
LEF(w4) + fla)}. 

When reconstructing a signal from a finite range of its Fourier components— 
for example from the output of an aperture-synthesis radio-telescope—it is 
good practice to smoothly suppress the higher frequencies in such a manner. 
This process is called apodizing (i.e. cutting off the feet of) the data. If 
we fail to apodize then any interesting sharp feature in the signal will be 
surrounded by “diffraction ring” artifacts. 


Exercise B.1: Suppose that we exponentially suppress the higher frequencies 
by multiplying the Fourier amplitude f(k) by e~‘!*!. Show that the original 
signal is smoothed by convolution with a Lorentzian approximation to a delta 


function 


L at € 
WO) PEGS ee 


Observe that 
lim 6¢ (x) = 6(a). 


e—0 


Exercise B.2: Consider the apodized Fourier series 


(oe) 


f,(0) = ‘3 anni ein? 


n=— CoO 
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where the parameter r lies in the range 0 < r < 1, and the coefficients are 


1 20 os 
=— Dee EO adds 
an on Jo e€ f(9) 
Assuming that it is legitimate to interchange the order of the sum and integral, 
show that 
2 


f(0) = 6° (6 — 6") f(6’)d6’ 
0 


>. a 1—r2 ed 
7 ze, (ee) loa. 


Here the superscript P stands for for Poisson because 6? (0) is the Poisson 
kernel that solves the Dirichlet problem in the unit disc. Show that oP (0) 
tends to a delta function as r — 1 from below. 


Exercise B.3: The periodic Hilbert transform. Show that in the limit r — 1 
the sum 


oO 10 —i0 
re re 
) sen (nei? rin! = 


n=—CO 


O<r<l 


l—re’® 1—re??’ 


becomes the principal-part distribution 


r(na()) 


Let f(@) be a smooth function on the unit circle, and define its Hilbert trans- 
form Hf to be 


1 ; 6-8'\ _, 
(Hf)(0) = —P f(o!) ct ( 5 ) ao 


Show the original function can be recovered from (H/)(@), together with 
knowledge of the angular average f = ig f (0) d0/27, as 


1 Qn : g@—@' F 1 Qn ? : 
f(0) al (HA)(6') cot ( 5 ) a +5 f senao 
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Exercise B.4: Find a closed-form expression for the sum 
(oe) 
S> Injerl"l O<r<l, 
n=—0o 
Now let f(@) be a smooth function defined on the unit circle and 


= I = f(@)e"” do 
ca 20 0 


its n-th Fourier coefficient. By taking a limit r — 1, show that 


= ef pee mao....9 {8-0'\ dO dé! 
1 Inlantn = ff) ~ 16) eosec? (SG) SX 


n=— CoO 


both the sum and integral being convergent. Show that these last two expres- 


sions are equal to 
1 
= yy |Vy|? rdrdo 
2 r<l 


where y(r,@) is the function harmonic in the unit disc, whose boundary value 


is f(@). 


Exercise B.5: Let f(k) be the Fourier transform of the smooth real function 
f(x). Take a suitable limit in the previous problem to show that that 


si=o ff. {fe fe) ava! = | \n) |] 


Exercise B.6: By taking a suitable limit in exercise B.3 show that, when acting 
on smooth functions f such that ie |f| dx is finite, we have H(Hf) = —f, 
where 

H es 

(Hf)(x) = — 
defines the Hilbert transform of a function on the real line. (Because H gives 
zero when acting on a constant, some condition, such as oak: | f| dx being finite, 
is necessary if H is to be invertible.) 
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B.4 The Poisson summation formula 


Suppose that f(x) is a smooth function that tends rapidly to zero at infinity. 
Then the series 


F(z)= S° f(x+nL) (B.56) 
converges to a smooth function of period L. It therefore has a Fourier ex- 
pansion 

Pa) ee ee eee (B.57) 


We can compute the Fourier coefficients a,, by integrating term-by-term 


a 
Gm = a F(a) e2™™2/" dy 
0 
i 
aa y i f(a +nL) e?**m2/L dr 
n=—oo 79 
i i Qrima/L 
= f(aje dx 
jee 
= pflanm/L). (B.58) 
Thus 
S- f(e+nb) = + S| f@am/Lje rl. (B.59) 


When we set x = 0, this last equation becomes 


SS f(nk) = + S> f (2nm/L). (B.60) 


n=—CoO M=— CO 


The equality of this pair of doubly infinite sums is known as the Poisson 
summation formula. 

Example: As the Fourier transform of a Gaussian is another Gaussian, the 
Poisson formula with L = 1 applied to f(x) = exp(—Kx?) gives 


s enn ‘2 3 err (B.61) 


m=—oo mM=—0o 
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and (rather more usefully) applied to exp(—3ta? +ix0) gives 


3 engin? tind _ y= > es 3 (0+2nn)? (B.62) 


n=—0Co n=—0Co 


The last identity is known as Jacobi’s imaginary transformation. It reflects 
the equivalence of the eigenmode expansion and the method-of-images solu- 
tion of the diffusion equation 
2 

ue = aa (B.63) 

2.047 “Ob 
on the unit circle. Notice that when t is small the sum on the right-hand side 
converges very slowly, whereas the sum on the left converges very rapidly. 
The opposite is true for large t. The conversion of a slowly converging series 
into a rapidly converging one is a standard application of the Poisson sum- 
mation formula. It is the prototype of many duality maps that exchange a 
physical model with a large coupling constant for one with weak coupling. 

If we take the limit t — 0 in (B.62), the right hand side approaches a 

sum of delta functions, and so gives us the useful identity 


(oe) 


il nx __ = 
oe oe ye d(a + 27n). (B.64) 


n=—oCoO n=—CO 


The right-hand side of (B.64) is sometimes called the “Dirac comb.” 


Gauss sums 


The Poisson sum formula 


wa/ty Ne ey, (B.65) 


m=— Co M=— CO 


remains valid for complex «, provided that Rex > 0. We can therefore 
consider the special case 


eaine + é, (B.66) 
q 


where € is a positive real number and p and q are positive integers whose 
product pq we assume to be even. We investigate what happens to (B.65) 
ase — 0). 
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The left-hand side of (B.65) can be decomposed into the double sum 


lee) q-1 
S- eit (p/a)(r-+mq)? 6—e(r-+mq)? (B.67) 


m=—oco r=0 


Because pq is even, each term in e~**(?/9("+™9)" is independent of m. At the 
same time, the small ¢€ limit of the infinite sum 


ere (B.68) 


mMm=— CoO 


being a Riemann sum for the integral 


- 1 
/ ee dm = ~ [2 (B.69) 
—oo qVe 


becomes independent of r, and so a common factor of all terms in the finite 
sum over 1. 
If € is small, we can make the replacement, 


-1 _ € — imp/q _y € — imp/q (B.70) 


K are Pn, Sh i ae ’ 
€2 + 1p? /q? Tp? /q? 


after which, the right-hand side contains the double sum 


love) p-1 


S- Ss eit (a/p)(r-+mp)? 6—€(4?/p?)(r-+mp)?_ (B.71) 


m=—oco r=0 
Again each term in e’™(4/P)("+™P)” is independent of m, and 
» eo (P/e’\rtmp)” _, i e "dm = - (2, (B.72) 


becomes independent of r. Also 


lim {2 \ = ae (B.73) 
e—0 K Pp 


Thus, after cancelling the common factor of (1/q)./7/e, we find that 


ie ae 
peeks eo ™P/Or? _ 6 el Se q/p)r? . pg even. (B.74) 


Q 
ll 
° 
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This Poisson-summation-like equality of finite sums is known as the Landsberg- 
Schaar identity. No purely algebraic proof is known. 
Gauss considered the special case p = 2, in which case we get 


—2nir2/q _ ,—in/4 f imq/2 
— € =e —(1l+e B.75 
a5 ) (B.75) 


a (1—%),/q¢, q=0 (mod 4), 

—2nir?/q V4: q= 1 (mod 4), 
= ‘: 0, q = 2 (mod 4), eo 
a ~i/G, q = 3 (mod 4) 


The complex conjugate result is perhaps slightly prettier: 


(+i) Vi q=0 (mod 4), 
S- amir? /q — V4; q=l a : (B.77) 
( ) 


Be 


0, q = 2 (mod 4), 
1/9, q = 3 (mod 4). 


r=0 
Gauss used these sums to prove the law of quadratic reciprocity. 


Exercise B.7: By applying the Poisson summation formula to the Fourier 
transform pair 


. ~ 2€ 
ae —elx| _—ix0 = 
flay=e le, and ft) =e 
where € > 0, deduce that 
sinh € = 2€ 
———_————_—_—_— = er B.78 
cosh € — cos(@ — 6’) ps e? + (6 — 0! + Irn)? ( ) 


Hence show that the Poisson kernel is equivalent to an infinite periodic sum 
of Lorentzians 


Co 


1 1—r? ee! lnr 
Qn \1—2rcos(6—-O') +r?) (Inr)? + (0 — 6 + 27n)?° 


n=— CoO 
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foliation, 427 
form 
n-linear, 853 
closed, 445 
quadratic, 865 
symplectic, 868 
Fourier series, 63, 876 
Fourier, Joseph, 205 
Fredholm 
determinant, 380 
equation, 349 
operator, 372, 543 
series, 381 
Fredholm alternative 
for differential operators, 155 
for integral equations, 360 
for systems of linear equations, 
851 
Fresnel integrals, 758 
Fridel sum rule, 153 
Frobenius’ 
integrability theorem, 428 
reciprocity theorem, 593 
Frobenius-Schur indicator, 590 
Fréchet derivative, see functional 
derivative 
function space, 56 
normed, 57 
functional 
definition, 2 
derivative, 3 
local, 2 


Gauss 
quadrature, 86 
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linking number, 486 

sum, 891 
Gauss-Bonnet theorem, 540, 674 
Gauss-Bruhat decomposition, 783 
Gaussian elimination, 367 
Gelfand-Dikii equation, 184, 192 
Gell-Mann “\” matrices, 631 
generalized functions, see distribu- 

tions 

generating function 

for Legendre polynomials, 306 

for Bessel functions, 314 

for Chern character, 537 
genus, 735 
geometric phase, see Berry’s phase 
geometric quantization, 655 
Gibbs’ phenomenon, 883 
gradient 

as a covector, 423 

in curvilinear co-ordinates, 298 
Gram-Schmidt procedure, 69, 304 
Grassmann, Herman, 400 
Green function 

analyticity of causal, 171 

causal, 162, 207, 219 

construction of, 157 

modified, 165, 171 

symmetry of, 167 
Green, George, 411 
group velocity, 261 


Haar measure, 619 

half-range Fourier series, 877 

hanging chain, see catenary 

Hankel function, 313 
spherical, 329 

harmonic conjugate, 684 

harmonic oscillator, 133 
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Haydock recursion, 90 
Heaviside function, 93 
Helmholtz 
decomposition, 228 
equation, 246, 313 
Helmholtz-Hodge decomposition, 251 
Hermitian 
differential operator, see formally 
self-adjoint operator 
matrix, 114, 859, 861 
Hermitian conjugate 
matrix, 846 
operator, see adjoint 
heterojunction, 126 
Hilbert space, 61 
rigged, 81 
Hilbert transform, 768, 886, 887 
Hodge 
“” map, 442, 740 
decomposition, 254, 543 
theory, 541 
Hodge, William, 541 
homeomorphism, 502 
homology group, 514 
homotopy, 482, 615 
class, 482 
Hopf 
bundle, see monopole bundle 
index, 484, 611 
map, 479, 608, 611 
horocycles, 745 
hydraulic jump, 280 


ideal, 623 

identity 
delta function as, 74 
matrix, 839 

image space, 839 


INDEX 


images, method of, 244 
immersion, 737 
index 
of operator, 851 
index theorem, 544, 781, 784 
indicial equation, 107 
induced metric, 469 
induced representation, 592 
inequality 
Cauchy-Schwarz-Bunyakovsky, 61 
triangle, 59, 60, 62 
inertial wave, 292 
infimum, 57 
infinitesimal homotopy relation, 439 
integral equation 
Fredholm, 349 
Volterra, 349 
integral kernel, 74 
interior multiplication, 439 
intersection form, 530 


Jacobi identity, 447, 623 
Jordan form, 799, 863 
jump condition, 158 


Kelvin wedge, 265 
kernel, 839 
Killing 

field, 433 

form, 625 
Killing, William, 433 
Kirchhoff approximation, 248 
Korteweg de Vries (KdV) equation, 

112 

Kramer’s degeneracy, 599 
Kramers-Kronig relation, 179 


Lagrange interpolation formula, 86 
Lagrange multiplier, 37 


INDEX 


as eigenvalue, 38 
Lagrange’s identity, 115 
Lagrange’s theorem, 560 
Lagrange, Joseph-Louis, 11 
Lagrangian, 11 

density, 19 
Lamé constants, 408 
Lanczos algorithm, 90 
Landsberg-Schaar identity, 891 
Laplace development, 856 
Laplace-Beltrami operator, 542 
Laplacian 

acting on vector field, 251, 252, 

301, 540 

in curvilinear co-ordinates, 301 
Lax pair, 112, 288 
least upper bound, see supremum 
Legendre function, 764 
Legendre function Q,,(x), 807 
Levi-Civita symbol, 403 
Levinson’s theorem, 153 
Lie 

algebra, 595 

bracket, 426, 622 

derivative, 431 
Lie, Sophus, 595 
limit-circle case, 332 
line bundle, 648 
linear dependence, 836 
linear map, 838 
linear operator 

bounded, 371 

closable, 375 

closed, 374 

compact, 371 

Fredholm, 372 

Hilbert-Schmidt, 372 
Liouville measure, 38 
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Liouville’s theorem, 100 

Liouville-Neumann series, 378 

Lipshitz’ formula, 774 

Lobachevski geometry, 43, 46, 497, 
744 


LU decomposition, see Gaussian elim- 


ination 


manifold, 420 
orientable, 465 
parallelizable, 642 
Riemann, 453 
symplectic, 445 
map 
anti-conformal, 688 
isogonal, 688 
Mobius, 730 
Maupertuis, Pierre de, 11 
Maxwell’s equations, 24 
Maxwell, James Clerk, 24 
measure, Liouville, 38 
Mehler’s formula, 72 
metric tensor, 842 
minimal polynomial equation, 860 
modular group, 829 
monodromy, 107, 798 
monopole bundle, 655, 670 
moonshine, monstrous, 560, 831 
Morse 
function, 546 
index, 546 
Morse function, 545 
multilinear form, 397 
multipliers, undetermined, see La- 
grange multipliers 
Mobius 
map, 730, 825 
strip, 648 
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Neumann function, 312 
Neumann’s formula, 764 
Noether’s theorem, 18 
Noether, Emmy, 16 
non-degerate, 842 
norm 

LP, 59 

definition of, 59 

sup, 99 
nullspace, see kernel 
Nyquist criterion, 172, 765 


observables, 123 
optical fibre, 326 
orbit,of group action, 564 
order 

of group, 558 
orientable manifold, 464 
orthogonal 

complement, 847 
orthonormal set, 62 


Poschel-Teller equation, 134, 143, 
148, 343, 805 
pairing, 80, 388, 525, 841 
Parseval’s theorem, 68, 879 
particular integral, 103, 169 
Pauli o matrices, 479, 599 
Peierls, Rudolph, 138, 330 
period 
and de Rham’s theorem, 526 
of elliptic function, 735 
Peter-Weyl theorem, 619 
Pfaffian system, 429 
phase shift, 139, 336 
phase space, 38 
phase velocity, 261 
Pliicker relations, 402, 417 
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Pliicker, Julius, 402 
Plateau’s problem, 1, 4 
Plemelj formulae, 762 
Plemelj, Josip, 177 
Poincaré 
disc, 46, 497, 744 
duality, 545 
lemma, 436, 503 
Poincaré-Hopf theorem, 546 
Poincaré-Bertrand theorem, 772 
point 
ordinary, 106 
regular singular, 107 
singular, 106 
singular endpoint, 332 
Poisson 
bracket, 446 
kernel, 235, 891 
summation, 888 
Poisson’s ratio, 409 
pole, 698 
polynomials 
Hermite, 71, 134 
Legendre, 70, 303, 373 
orthogonal, 69 
Tchebychef, 73, 361 
Pontryagin class, 539 
pressure, 28 
principal bundle, 647 
principal part integral, 84, 177, 361, 
754 
principle of least action, see action 
principle 
product 
cup, 529 
direct, 567 
group axioms, 557 
inner, 841 
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of matrices, 839 

tensor, 396 

wedge, 399, 435 
projective plane, 515 
pseudo-momentum, 23 
pseudo-potential, 339 


quadratic form, 865 
diagonalizing, 865 
signature of, 867 

quaternions, 599 

quotient 
of vector spaces, 847 
group, 560 
space, 565 


range, see image space 
range-nullspace theorem, 839 
rank 
column, 839 
of Lie algebra, 635 
of matrix, 839 
of tensor, 391 
Rayleigh-Ritz method, 131 
realm, sublunary, 137 
recurrence relation 
for Bessel functions, 315, 332 
for orthogonal polynomials, 69 
residue, 698 
resolution of the identity, 577, 849, 
862, 865 
resolvent, 179 
resonance, 141, 339 
retraction, 503 
Riemann 
P symbol, 802 
sum, 694 
surface, 732 
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Riemann-Hilbert problem, 367 
Riemann-Lebesgue lemma, 880 
Riesz-Fréchet theorem, 81, 85 
Rodriguez’ formula, 70, 304, 807 
rolling conditions, 430, 493 

root vector, 632 

Routhian, 15 

Russian formula, 678 


scalar product, see product, inner 
scattering length, 336 
Schwartz space, 79 
Schwartz, Laurent, 79 
Schwarzian derivative, 110 
Scott Russell, John, 286 
section, 649 

of bundle, 425 
Seeley coefficients, 185 
self-adjoint extension, 124 
self-adjoint matrix, 859 
self-adjoint operator 

formally, 116 

truly, 122 
seminorm, 79 
Serret-Frenet relations, 494 
sesquilinear, 841 
sextant, 613 
shear modulus, 408 
sheet, 732 
simplex, 509 
simplicial complex, 509 
singular endpoint, 332 
singular integral equations, 361 
skew-symmetric form, see symplec- 

tic form 

skyrmion, 476 
soap film, 4 
soliton, 112, 284 
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space 
L?, 59 
Banach, 61 
Hilbert, 61 
homogeneous, 565 
of C” functions, 56 
of test functions, 79 
retractable, 503 
spanning set, 837 
spectrum 
continuous, 129, 333 
discrete, 129 
point, see discrete spectrum 
spherical harmonics, 309 
spinor, 479, 612 
stereographic map, 477, 728 
Stokes’ 
line, 822 
phenomenon, 816 
theorem, 470 
strain tensor, 434 
stream-function, 685 
streamline, 685 
string 
sliding, 31 
vibrating, 20 
structure constants, 602 
Sturm-Liouville operator, 40, 41, 76, 
116 
supremum, 57 
symmetric differential operator, see 
formally self-adjoint opera- 
tor 
symplectic form, 445, 868 


tangent 
bundle, 420 
space, 419 
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tantrix, 490 

Taylor column, 292 

tensor 
Cartesian, 404 
curvature, 453 
elastic constants, 45 
energy-momentum, 22 
isotropic, 405 
metric, 842 
momentum flux, 29 
strain, 45, 406, 434 
stress, 45, 406 
torsion, 452 

test function, 79 

theorem 
Abel, 712 
addition 

for spherical harmonics, 310 
Blasius, 707 
Cayley’s, 857 
Darboux, 446 
de Rham, 526 
Frobenius integrability, 428 
Frobenius’ reciprocity, 593 
Gauss-Bonnet, 540, 674 
Green’s, 242 
Lagrange, 560 
mean value for harmonic func- 
tions, 235 

Morse index, 546 
Peter-Weyl, 619 
Picard, 724 
Poincaré-Hopf, 546 
Poincaré-Bertrand, 772 
range-nullspace, 839 
residue, 698 
Riemann mapping, 690 
Riesz-Fréchet, 81, 85 
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Stokes, 470 
Weierstrass approximation, 70 
Weyl’s, 332 
Theta function, 727 
tidal bore, see hydraulic jump 
topological current, 483 
torsion 
in homology, 516 
of curve, 494 


tensor, 452 
transfom 

Hilbert, 768 
transform 


Fourier, 350, 878 
Fourier-Bessel, see Hankel 
Hankel, 322 

Hilbert, 886, 887 

Laplace, 350, 352 
Legendre, 15, 29 

Mellin, 350 

Mellin sine, 236 

Radon, 355 


variational principle, 131 
variety, 398 


Segre, 398 
vector 
bundle, 450 


Laplacian, 251, 252, 301, 540 
vector space 

definition, 835 
velocity potential, 27, 684 

as lagrange multiplier, 40 
vielbein, 450 

orthonormal, 456, 534 
volume form, 469 
vorticity, 28, 50 


wake, 264 
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ship, 265 
wave 
drag, 264 
equation, 200 
momentum, see pseudo-momentum 
non-linear, 274 
shock, 277 
surface, 33, 257 
transverse, 21 
Weber’s disc, 323 
Weierstrass 
yg function, 825 
approximation theorem, 70 
weight, 632 
Weitzenbock formula, 556 
Weyl’s 
identity, 598, 871 
theorem, 332 
Weyl, Hermann, 138, 332 
Wiener-Hopf 
integral equations, 365 
sum equations, 778 
winding number, 475 
Wronskian, 98 
and linear dependence, 100 
in Green function, 158 
Wulff construction, 51 


Young’s modulus, 409 


